From 3c0139c9a2ad9d46e4740e3ba7fd81c509315cfb Mon Sep 17 00:00:00 2001 From: Michael Hohn Date: Tue, 9 May 2023 09:04:44 -0700 Subject: [PATCH 01/28] Major README updates --- README.md | 84 +++++++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 79 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index d92f997..a187294 100644 --- a/README.md +++ b/README.md @@ -5,15 +5,30 @@ This workshop is based on a significantly simplified and modified version of the ## Setup Instructions - Install [Visual Studio Code](https://code.visualstudio.com/). + - Install the [CodeQL extension for Visual Studio Code](https://codeql.github.com/docs/codeql-for-visual-studio-code/setting-up-codeql-in-visual-studio-code/). + - Install the latest version of the [CodeQL CLI](https://github.com/github/codeql-cli-binaries/releases). + - Clone this repository: ```bash git clone https://github.com/kraiouchkine/codeql-workshop-runtime-values-c ``` -- Install the CodeQL pack dependencies using the command `CodeQL: Install Pack Dependencies` and select `exercises`, `solutions`, `exercises-tests`, and `solutions-tests` from the list of packs. -- If you have CodeQL on your PATH, build the database using `build-database.sh` and load the database with the VS Code CodeQL extension. + +- Install the CodeQL pack dependencies using the command `CodeQL: Install Pack + Dependencies` and select `exercises`, `solutions`, `exercises-tests`, `session`, + `session-db` and `solutions-tests` from the list of packs. + +- If you have CodeQL on your PATH, build the database using `build-database.sh` + and load the database with the VS Code CodeQL extension. It is at + `session-db/cpp-runtime-values-db`. - Alternatively, you can download [this pre-built database](https://drive.google.com/file/d/1N8TYJ6f4E33e6wuyorWHZHVCHBZy8Bhb/view?usp=sharing). + +- If you do **not** have CodeQL on your PATH, build the database using the unit + test sytem. Choose the `TESTING` tab in VS Code, run the + `session-db/DB/db.qlref` test. The test will fail, but it leaves a usable CodeQL + database in `session-db/DB/DB.testproj`. + - :exclamation:Important:exclamation:: Run `initialize-qltests.sh` to initialize the tests. Otherwise, you will not be able to run the QLTests in `exercises-tests`. ## Introduction @@ -63,12 +78,22 @@ This workshop is not intended to be a complete analysis that is useful for real- The goal of this workshop is rather to demonstrate the building blocks of analyzing run-time values and how to apply those building blocks to modelling a common class of vulnerability. A more comprehensive and production-appropriate example is the [OutOfBounds.qll library](https://github.com/github/codeql-coding-standards/blob/main/c/common/src/codingstandards/c/OutOfBounds.qll) from the [CodeQL Coding Standards repository](https://github.com/github/codeql-coding-standards). -## Exercises +## Session/Workshop notes +Unlike the the [exercises](#org3b74422) which use the *collection* of test +problems in `exercises-test`, a workshop follows `session/session.ql` and uses a +*single* database built from a single, larger segment of code. + + +## Exercises +These exercises use the collection of test problems in `exercises-test`. + ### Exercise 1 In the first exercise we are going to start by modelling a dynamic allocation with `malloc` and an access to that allocated buffer with an array expression. The goal of this exercise is to then output the array access, buffer, array size, and buffer offset. The [first test-case](solutions-tests/Exercise1/test.c) is a simple one, as both the allocation size and array offsets are constants. +For this exercise, connect the allocation(s), the array accesses, and the sizes in each. + Run the query and ensure that you have three results. #### Hints @@ -76,7 +101,14 @@ Run the query and ensure that you have three results. 2. Use `DataFlow::localExprFlow()` to relate the allocated buffer to the array base. ### Exercise 2 -This exercise uses the same C source code, duplicated for the test [here](solutions-tests/Exercise2/test.c). +This exercise uses the same C source code with an addition: a constant array size +propagated [via a variable](solutions-tests/Exercise2/test.c). + +Hints: +1. start with plain `from...where...select` query. +2. use + `elementSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize()` +2. convert your query to predicate or use classes as outlined below, if desired. #### Task 1 With the basic elements of the analysis in place, refactor the query into two classes: `AllocationCall` and `ArrayAccess`. The `AllocationCall` class should model a call to `malloc` and the `ArrayAccess` class should model an array access expression (`ArrayExpr`). @@ -90,6 +122,8 @@ Use local data-flow analysis to complete the `getSourceConstantExpr` predicate. ### Exercise 3 This exercise has slightly more C source code [here](solutions-tests/Exercise3/test.c). +Note: the `test_const_branch` has `buf[100]` with size == 100 + Running the query from Exercise 2 against the database yields a significant number of missing or incorrect results. The reason is that although great at identifying compile-time constants and their use, data-flow analysis is not always the right tool for identifying the *range* of values an `Expr` might have, particularly when multiple potential constants might flow to an `Expr`. The CodeQL standard library several mechanisms for addressing this problem; in the remainder of this workshop we will explore two of them: `SimpleRangeAnalysis` and, later, `GlobalValueNumbering`. @@ -110,6 +144,12 @@ Implement the `isOffsetOutOfBoundsConstant` predicate to check if the array offs You should now have five results in the test (six in the built database). ### Exercise 4 +Note: We *could* evolve this code to handle `size`s inside conditionals using +guards on data flow. But this amounts to implementing a small interpreter. +The range analysis library already handles conditional branches; we don't +have to use guards on data flow -- don't implement your own interpreter if you can +use the library. + Again, a slight longer C [source snippet](solutions-tests/Exercise4/test.c). A common issue with the `SimpleRangeAnalysis` library is handling of cases where the bounds are undeterminable at compile-time on one or more paths. For example, even though certain branches have clearly defined bounds, the range analysis library will define the `upperBound` and `lowerBound` of `val` as `INT_MIN` and `INT_MAX` respectively: @@ -164,4 +204,38 @@ Do not compute the GVN of the entire array index expression; use the base of an Exclude duplicate results by only reporting `isOffsetOutOfBoundsGVN` for `access`/`source` pairs that are not already reported by `isOffsetOutOfBoundsConstant`. -You should now see thirteen results. \ No newline at end of file +You should now see thirteen results. + +Global value numbering only knows that runtime values are equal; they are not +comparable (`<, >, <=` etc.), and the *actual* value is not known. +Reference: https://codeql.github.com/docs/codeql-language-guides/hash-consing-and-value-numbering/ + + +In the query, look for and use *relative* values between allocation and use. To +do this, use GVN. +This is the case in + + void test_gvn_var(unsigned long x, unsigned long y, unsigned long sz) + { + char *buf = malloc(sz * x * y); + buf[sz * x * y - 1]; // COMPLIANT + buf[sz * x * y]; // NON_COMPLIANT + buf[sz * x * y + 1]; // NON_COMPLIANT + } + +Range analyis won't bound `sz * x * y`, so switch to global value numbering. + +Global value numbering finds expressions that are known to have the same runtime +value, independent of structure. To get the Global Value Number in CodeQL: + + ... + globalValueNumber(e) = globalValueNumber(sizeExpr) and + e != sizeExpr + ... + +We can use global value numbering to identify common values as first step, but for +expressions like + + buf[sz * x * y - 1]; // COMPLIANT + +we have to "evaluate" the expressions -- or at least bound them. From b1db4ec02ca2f15b4212fb408abbd47c81f0e9d0 Mon Sep 17 00:00:00 2001 From: Michael Hohn Date: Tue, 9 May 2023 09:06:23 -0700 Subject: [PATCH 02/28] Add explicit workshop/session support --- session-db/DB/db.c | 85 +++++++++++++++++++++++++ session-db/DB/db.expected | 3 + session-db/DB/db.qlref | 1 + session-db/qlpack.yml | 8 +++ session/qlpack.yml | 6 ++ session/session.ql | 126 ++++++++++++++++++++++++++++++++++++++ 6 files changed, 229 insertions(+) create mode 100644 session-db/DB/db.c create mode 100644 session-db/DB/db.expected create mode 100644 session-db/DB/db.qlref create mode 100644 session-db/qlpack.yml create mode 100644 session/qlpack.yml create mode 100644 session/session.ql diff --git a/session-db/DB/db.c b/session-db/DB/db.c new file mode 100644 index 0000000..6eaa405 --- /dev/null +++ b/session-db/DB/db.c @@ -0,0 +1,85 @@ +void *malloc(unsigned long);/* clang compatible */ + +unsigned long extern_get_size(void); + +void test_const(void) +{ + char *buf = malloc(100); + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[100]; // NON_COMPLIANT +} + +void test_const_var(void) +{ + unsigned long size = 100; + char *buf = malloc(size); + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[size - 1]; // COMPLIANT + buf[100]; // NON_COMPLIANT + buf[size]; // NON_COMPLIANT +} + +void test_const_branch(int mode, int random_condition) +{ + unsigned long size = (mode == 1 ? 100 : 200); + + char *buf = malloc(size); + + if (random_condition) + { + size = 300; + } + + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[size - 1]; // NON_COMPLIANT + buf[100]; // NON_COMPLIANT[DONT REPORT] + buf[size]; // NON_COMPLIANT + + if (size < 199) + { + buf[size]; // COMPLIANT + buf[size + 1]; // COMPLIANT + buf[size + 2]; // NON_COMPLIANT + } +} + +void test_const_branch2(int mode) +{ + unsigned long alloc_size = 0; + + if (mode == 1) + { + alloc_size = 200; + } + else + { + // unknown const size - don't report accesses + alloc_size = extern_get_size(); + } + + char *buf = malloc(alloc_size); + + buf[0]; // COMPLIANT + buf[100]; // COMPLIANT + buf[200]; // NON_COMPLIANT + buf[alloc_size - 1]; // COMPLIANT + buf[alloc_size]; // NON_COMPLIANT + + if (alloc_size < 199) + { + buf[alloc_size]; // COMPLIANT + buf[alloc_size + 1]; // COMPLIANT + buf[alloc_size + 2]; // NON_COMPLIANT + } +} + +void test_gvn_var(unsigned long x, unsigned long y, unsigned long sz) +{ + char *buf = malloc(sz * x * y); + buf[sz * x * y - 1]; // COMPLIANT + buf[sz * x * y]; // NON_COMPLIANT + buf[sz * x * y + 1]; // NON_COMPLIANT +} diff --git a/session-db/DB/db.expected b/session-db/DB/db.expected new file mode 100644 index 0000000..6ee8899 --- /dev/null +++ b/session-db/DB/db.expected @@ -0,0 +1,3 @@ +| test.c:4:5:4:10 | access to array | 0 | test.c:3:17:3:22 | call to malloc | 100 | +| test.c:5:5:5:11 | access to array | 99 | test.c:3:17:3:22 | call to malloc | 100 | +| test.c:6:5:6:12 | access to array | 100 | test.c:3:17:3:22 | call to malloc | 100 | diff --git a/session-db/DB/db.qlref b/session-db/DB/db.qlref new file mode 100644 index 0000000..21829a4 --- /dev/null +++ b/session-db/DB/db.qlref @@ -0,0 +1 @@ +session.ql \ No newline at end of file diff --git a/session-db/qlpack.yml b/session-db/qlpack.yml new file mode 100644 index 0000000..f98375c --- /dev/null +++ b/session-db/qlpack.yml @@ -0,0 +1,8 @@ +--- +library: false +name: session-db +version: 0.0.1 +dependencies: + "session": "*" +extractor: cpp +tests: . diff --git a/session/qlpack.yml b/session/qlpack.yml new file mode 100644 index 0000000..6c8e1f8 --- /dev/null +++ b/session/qlpack.yml @@ -0,0 +1,6 @@ +--- +library: false +name: session +version: 0.0.1 +dependencies: + codeql/cpp-all: 0.6.1 diff --git a/session/session.ql b/session/session.ql new file mode 100644 index 0000000..9066819 --- /dev/null +++ b/session/session.ql @@ -0,0 +1,126 @@ +/** + * @ kind problem + */ + +import cpp + +// Ex.1 +// void test_const(void) +// void test_const_var(void) + +from AllocationExpr buffer, ArrayExpr access, int bufferSize, int accessIdx, Expr allocSizeExpr +where + // malloc (100) + // ^^^^^^ in the AllocationExpr buffer + + // buf[...] + // ^^^ ArrayExpr access + + accessIdx = access.getArrayOffset().getValue().toInt() and + // malloc (100) + // ^^^ + allocSizeExpr.getValue().toInt() = bufferSize +select buffer, access, accessIdx + +// from AllocationExpr buffer, ArrayExpr access, int bufferSize, int accessOffset, int accessIdx, int elementSize, Expr allocSizeExpr +// where +// // malloc (100) +// // ^^^^^^ in the AllocationExpr buffer + +// // buf[...] +// // ^^^ ArrayExpr access + +// accessIdx = access.getArrayOffset().getValue().toInt() and +// elementSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() and +// accessOffset = accessIdx * elementSize and +// // malloc (100) +// // ^^^ +// allocSizeExpr.getValue().toInt() = bufferSize +// select buffer, access + + +/* + * char *buf = malloc(100); + * buf[0]; // COMPLIANT + * buf[99]; // COMPLIANT + * buf[100]; // NON_COMPLIANT + * + * #define FACTOR 2 + * ... + * unsigned long size = 100 * FACTOR; + * char *buf = malloc(size); + * buf[0]; // COMPLIANT + * buf[99]; // COMPLIANT + * buf[size - 1]; // COMPLIANT + * buf[100]; // NON_COMPLIANT + * buf[size]; // NON_COMPLIANT + */ + +import semmle.code.cpp.dataflow.DataFlow + +class BufferAccess extends ArrayExpr { + AllocationExpr buffer; + int bufferSize; + Expr offsetExpr; + BufferAccess() { + exists(Expr allocSizeExpr | + DataFlow::localExprFlow(buffer, this.getArrayBase()) and + offsetExpr = this.getArrayOffset() and + allocSizeExpr.getValue().toInt() = bufferSize and + DataFlow::localExprFlow(allocSizeExpr, buffer.getSizeExpr())) + } + + AllocationExpr getBuffer() { + result = buffer + } + + Expr getAccessExpr() { + result = offsetExpr + } + + int getBufferSize() { + result = bufferSize + } +} + +// predicate bufferAccess(AllocationExpr buffer, ArrayExpr access, int bufferSize, int accessOffset) { +// exists(int accessIdx, int elementSize, Expr allocSizeExpr | +// DataFlow::localExprFlow(buffer, access.getArrayBase()) and +// accessIdx = access.getArrayOffset().getValue().toInt() and +// elementSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() and +// accessOffset = accessIdx * elementSize and +// allocSizeExpr.getValue().toInt() = bufferSize and +// DataFlow::localExprFlow(allocSizeExpr, buffer.getSizeExpr()) +// ) +// } + + + +// from BufferAccess ba, int accessOffset, int bufferSize +// where upperBound(ba.getAccessExpr()) = accessOffset and +// bufferSize = ba.getBufferSize() and +// accessOffset >= bufferSize +// select ba, "Possible out of bounds access with offset " + accessOffset + " and size " + bufferSize + +// from AllocationExpr alloc, ArrayExpr access, Expr sizeExpr, Expr partOfAccess +// where alloc.getSizeExpr() = sizeExpr and +// ( +// // malloc(sz * x * y); +// // ... +// // buf[sz * x * y]; +// access.getArrayOffset() = partOfAccess +// or +// // buf[sz * x * y + 1]; +// exists(AddExpr add | +// access.getArrayOffset() = add and +// add.getAnOperand() = partOfAccess and +// add.getAnOperand().getValue().toInt() > 0 +// ) +// ) +// and +// partOfAccess != sizeExpr and +// globalValueNumber(partOfAccess) = globalValueNumber(sizeExpr) +// select sizeExpr, partOfAccess + +// import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis +// import semmle.code.cpp.valuenumbering.GlobalValueNumbering From c2d0120e624440042256590be54398d2f6aac04c Mon Sep 17 00:00:00 2001 From: Michael Hohn Date: Tue, 9 May 2023 09:06:54 -0700 Subject: [PATCH 03/28] Build database from session support file --- build-database.sh | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/build-database.sh b/build-database.sh index fad5699..fb3141c 100755 --- a/build-database.sh +++ b/build-database.sh @@ -1,3 +1,4 @@ -SRCDIR=$(pwd)/solutions-tests/Exercise6 -DB=$(pwd)/cpp-runtime-values-db -codeql database create --language=cpp -s "$SRCDIR" -j 8 -v $DB --command="clang -fsyntax-only -Wno-unused-value $SRCDIR/test.c" \ No newline at end of file +#!/bin/bash +SRCDIR=$(pwd)/session-db +DB=$SRCDIR/cpp-runtime-values-db +codeql database create --language=cpp -s "$SRCDIR" -j 8 -v $DB --command="clang -fsyntax-only -Wno-unused-value $SRCDIR/DB/db.c" From 118abc68b1b3f1abcf1c1e75e3e5ae51f30b7e07 Mon Sep 17 00:00:00 2001 From: Michael Hohn Date: Tue, 9 May 2023 09:18:25 -0700 Subject: [PATCH 04/28] clarify README --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index a187294..721c83c 100644 --- a/README.md +++ b/README.md @@ -206,6 +206,8 @@ Exclude duplicate results by only reporting `isOffsetOutOfBoundsGVN` for `access You should now see thirteen results. +Some notes: + Global value numbering only knows that runtime values are equal; they are not comparable (`<, >, <=` etc.), and the *actual* value is not known. Reference: https://codeql.github.com/docs/codeql-language-guides/hash-consing-and-value-numbering/ From 30e5f43ba588db8fb2049a946bdd1c50a86c9be4 Mon Sep 17 00:00:00 2001 From: Michael Hohn Date: Tue, 9 May 2023 12:22:01 -0700 Subject: [PATCH 05/28] tree depth change --- session/session.org | 415 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 415 insertions(+) create mode 100644 session/session.org diff --git a/session/session.org b/session/session.org new file mode 100644 index 0000000..2877aac --- /dev/null +++ b/session/session.org @@ -0,0 +1,415 @@ +* CodeQL Workshop --- Using Data-Flow and Range Analysis to Find Out-Of-Bounds Accesses +:PROPERTIES: +:CUSTOM_ID: codeql-workshop--using-data-flow-and-range-analysis-to-find-out-of-bounds-accesses +:END: +* Acknowledgements + :PROPERTIES: + :CUSTOM_ID: acknowledgements + :END: + +This session-based workshop is based on the exercise/unit-test-based material at +https://github.com/kraiouchkine/codeql-workshop-runtime-values-c, which in turn is +based on a significantly simplified and modified version of the +[[https://github.com/github/codeql-coding-standards/blob/main/c/common/src/codingstandards/c/OutOfBounds.qll][OutOfBounds.qll library]] from the +[[https://github.com/github/codeql-coding-standards][CodeQL Coding Standards +repository]]. + +* Setup Instructions + :PROPERTIES: + :CUSTOM_ID: setup-instructions + :END: +- Install [[https://code.visualstudio.com/][Visual Studio Code]]. + +- Install the + [[https://codeql.github.com/docs/codeql-for-visual-studio-code/setting-up-codeql-in-visual-studio-code/][CodeQL extension for Visual Studio Code]]. + +- Install the latest version of the + [[https://github.com/github/codeql-cli-binaries/releases][CodeQL CLI]]. + +- Clone this repository: + #+begin_src sh + git clone https://github.com/hohn/codeql-workshop-runtime-values-c + #+end_src + +- Install the CodeQL pack dependencies using the command + =CodeQL: Install Pack Dependencies= and select =exercises=, + =solutions=, =exercises-tests=, =session=, =session-db= and + =solutions-tests= from the list of packs. + +- If you have CodeQL on your PATH, build the database using + =build-database.sh= and load the database with the VS Code CodeQL + extension. It is at =session-db/cpp-runtime-values-db=. + + - Alternatively, you can download + [[https://drive.google.com/file/d/1N8TYJ6f4E33e6wuyorWHZHVCHBZy8Bhb/view?usp=sharing][this + pre-built database]]. + +- If you do *not* have CodeQL on your PATH, build the database using the + unit test sytem. Choose the =TESTING= tab in VS Code, run the + =session-db/DB/db.qlref= test. The test will fail, but it leaves a + usable CodeQL database in =session-db/DB/DB.testproj=. + +- ❗Important❗: Run =initialize-qltests.sh= to initialize the tests. + Otherwise, you will not be able to run the QLTests in + =exercises-tests=. + +* Introduction + :PROPERTIES: + :CUSTOM_ID: introduction + :END: +This workshop focuses on analyzing and relating two values --- array +access indices and memory allocation sizes --- in order to identify +simple cases of out-of-bounds array accesses. + +The following snippets demonstrate how an out-of-bounds array access can +occur: + +#+begin_src cpp +char* buffer = malloc(10); +buffer[9] = 'a'; // ok +buffer[10] = 'b'; // out-of-bounds +#+end_src + +A more complex example: + +#+begin_src cpp +char* buffer; +if(rand() == 1) { + buffer = malloc(10); +} +else { + buffer = malloc(11); +} +size_t index = 0; +if(rand() == 1) { + index = 10; +} +buffer[index]; // potentially out-of-bounds depending on control-flow +#+end_src + +Another common case /not/ covered in this introductory workshop involves +loops, as follows: + +#+begin_src cpp +int elements[5]; +for (int i = 0; i <= 5; ++i) { + elements[i] = 0; +} +#+end_src + +To find these issues, we can implement an analysis that tracks the upper +or lower bounds on an expression and, combined with data-flow analysis +to reduce false-positives, identifies cases where the index of the array +results in an access beyond the allocated size of the buffer. + +* A Note on the Scope of This Workshop + :PROPERTIES: + :CUSTOM_ID: a-note-on-the-scope-of-this-workshop + :END: +This workshop is not intended to be a complete analysis that is useful +for real-world cases of out-of-bounds analyses for reasons including but +not limited to: + +- Missing support for loops and recursion +- No interprocedural analysis +- Missing size calculation of arrays where the element size is not 1 +- No support for pointer arithmetic or in general, operations other than + addition and subtraction +- Overly specific modelling of a buffer access as an array expression + +The goal of this workshop is rather to demonstrate the building blocks +of analyzing run-time values and how to apply those building blocks to +modelling a common class of vulnerability. A more comprehensive and +production-appropriate example is the +[[https://github.com/github/codeql-coding-standards/blob/main/c/common/src/codingstandards/c/OutOfBounds.qll][OutOfBounds.qll +library]] from the +[[https://github.com/github/codeql-coding-standards][CodeQL Coding +Standards repository]]. + +* Session/Workshop notes + :PROPERTIES: + :CUSTOM_ID: sessionworkshop-notes + :END: +Unlike the the [[#org3b74422][exercises]] which use the /collection/ of +test problems in =exercises-test=, a workshop follows +=session/session.ql= and uses a /single/ database built from a single, +larger segment of code. + +* Exercises + :PROPERTIES: + :CUSTOM_ID: exercises + :END: +These exercises use the collection of test problems in =exercises-test=. + +** Exercise 1 + :PROPERTIES: + :CUSTOM_ID: exercise-1 + :END: +In the first exercise we are going to start by modelling a dynamic +allocation with =malloc= and an access to that allocated buffer with an +array expression. The goal of this exercise is to then output the array +access, buffer, array size, and buffer offset. + +The [[file:solutions-tests/Exercise1/test.c][first test-case]] is a +simple one, as both the allocation size and array offsets are constants. + +For this exercise, connect the allocation(s), the array accesses, and +the sizes in each. + +Run the query and ensure that you have three results. + +*** Hints + :PROPERTIES: + :CUSTOM_ID: hints + :END: +1. =Expr::getValue()::toInt()= can be used to get the integer value of a + constant expression. +2. Use =DataFlow::localExprFlow()= to relate the allocated buffer to the + array base. + +** Exercise 2 + :PROPERTIES: + :CUSTOM_ID: exercise-2 + :END: +This exercise uses the same C source code with an addition: a constant +array size propagated [[file:solutions-tests/Exercise2/test.c][via a +variable]]. + +XX: + +1. start with query. + =elementSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize()= +2. convert to predicate. +3. then use classes, if desired. =class BufferAccess extends ArrayExpr= + is different from those below. + +*** Task 1 + :PROPERTIES: + :CUSTOM_ID: task-1 + :END: +With the basic elements of the analysis in place, refactor the query +into two classes: =AllocationCall= and =ArrayAccess=. The +=AllocationCall= class should model a call to =malloc= and the +=ArrayAccess= class should model an array access expression +(=ArrayExpr=). + +*** Task 2 + :PROPERTIES: + :CUSTOM_ID: task-2 + :END: +Next, note the missing results for the cases in =test_const_var= which +involve a variable access rather than a constant. The goal of this task +is to implement the =getSourceConstantExpr=, =getFixedSize=, and +=getFixedArrayOffset= predicates to handle the case where the allocation +size or array index are variables rather than integer constants. + +Use local data-flow analysis to complete the =getSourceConstantExpr= +predicate. The =getFixedSize= and =getFixedArrayOffset= predicates can +be completed using =getSourceConstantExpr=. + +** Exercise 3 + :PROPERTIES: + :CUSTOM_ID: exercise-3 + :END: +This exercise has slightly more C source code +[[file:solutions-tests/Exercise3/test.c][here]]. + +XX: test_const_branch buf[100] with size == 100 + +Running the query from Exercise 2 against the database yields a +significant number of missing or incorrect results. The reason is that +although great at identifying compile-time constants and their use, +data-flow analysis is not always the right tool for identifying the +/range/ of values an =Expr= might have, particularly when multiple +potential constants might flow to an =Expr=. + +XX: explain using source code. + +XX: autogen accessor predicates? + +The CodeQL standard library several mechanisms for addressing this +problem; in the remainder of this workshop we will explore two of them: +=SimpleRangeAnalysis= and, later, =GlobalValueNumbering=. + +Although not in the scope of this workshop, a standard use-case for +range analysis is reliably identifying integer overflow and validating +integer overflow checks. + +*** Task 1 + :PROPERTIES: + :CUSTOM_ID: task-1-1 + :END: +Change the implementation of the =getFixedSize= and +=getFixedArrayOffset= predicates to use the =SimpleRangeAnalysis= +library rather than data-flow. Specifically, the relevant predicates are +=upperBound= and =lowerBound=. Decide which to use for this exercise +(=upperBound=, =lowerBound=, or both). + +Experiment with different combinations of the =upperBound= and +=lowerBound= predicates to see how they impact the results. + +Hint: + +Use =upperBound= for both predicates. + +*** Task 2 + :PROPERTIES: + :CUSTOM_ID: task-2-1 + :END: +Implement the =isOffsetOutOfBoundsConstant= predicate to check if the +array offset is out-of-bounds. A template has been provided for you. + +You should now have five results. + +** Exercise 4 + :PROPERTIES: + :CUSTOM_ID: exercise-4 + :END: +XX: The range analysis already handles conditional branches; we don't +have to use guards on data flow -- don't implement your own interpreter +if you can use the library. + +Again, a slight longer C [[file:solutions-tests/Exercise4/test.c][source +snippet]]. + +A common issue with the =SimpleRangeAnalysis= library is handling of +cases where the bounds are undeterminable at compile-time on one or more +paths. For example, even though certain paths have clearly defined +bounds, the range analysis library will define the =upperBound= and +=lowerBound= of =val= as =INT_MIN= and =INT_MAX= respectively: + +#+begin_src cpp +int val = rand() ? rand() : 30; +#+end_src + +A similar case is present in the =test_const_branch= and +=test_const_branch2= test-cases in the =Exercise3= test case. Note the +issues with your Exercise 3 for these test-cases. In these cases, it is +necessary to augment range analysis with data-flow and restrict the +bounds to the upper or lower bound of computable constants that flow to +a given expression. + +*** Task 1 + :PROPERTIES: + :CUSTOM_ID: task-1-2 + :END: +To refine the bounds used for validation, start by implementing +=getSourceConstantExpr=. Then, implement =getMaxStatedValue= according +to the +[[https://codeql.github.com/docs/ql-language-reference/ql-language-specification/#qldoc-qldoc][QLDoc]] +documentation in =Exercise4.ql=. + +** Task 2 + :PROPERTIES: + :CUSTOM_ID: task-2-2 + :END: +Update the =getFixedSize= and =getFixedArrayOffset= predicates to use +the =getMaxStatedValue= predicate. + +You should now have six results. However, some results annotated as +=NON_COMPLIANT= in the test-case are still missing. Why is that? + +Hint: + +Which expression is passed to the =getMaxStatedValue= predicate? + +Answer: + +The missing results involve arithmetic offsets (right operand) from a +base value (left operand). The =getMaxStatedValue= predicate should only +be called on the base expression, not any =AddExpr= or =SubExpr=, as +=getMaxStatedValue= relies on data-flow analysis. + +** Exercise 5 + :PROPERTIES: + :CUSTOM_ID: exercise-5 + :END: +The [[file:solutions-tests/Exercise5/test.c][source snippet]] is +unchanged but replicated for the test. + +XX: the cases +=39:14: if (size < 199) 69:20: if (alloc_size < 199)= need to be +exempted. + +XX: examine the index expression value, and compare it to the +upper/lower bounds. /Then/ expand the query. + +Since we aren't using pure range analysis via the =upperBound= and/or +=lowerBound= predicates, handling =getMaxStatedValue= for =AddExpr= and +=SubExpr= is necessary. + +In the interest of time and deduplicating work in this workshop, only +implement that check in =getFixedArrayOffset=. In a real-world scenario, +it would be necessary to analyze offsets of both the buffer allocation +size and array index. + +Complete the following predicates: + +- =getExprOffsetValue= +- =getFixedArrayOffset= + +You should now see nine results. + +** Exercise 6 + :PROPERTIES: + :CUSTOM_ID: exercise-6 + :END: +TODO: intro to GVN write-up here TODO: finish below instructions + +XX: reference: +[[https://codeql.github.com/docs/codeql-language-guides/hash-consing-and-value-numbering/]] +Global value numbering only knows that runtime values are equal; they +are not comparable (=<, >, <== etc.), and the /actual/ value is not +known. + +XX: Look for and use /relative/ values between allocation and use. To do +this, use GVN. + +XX: This is the case in + +#+begin_example +void test_gvn_var(unsigned long x, unsigned long y, unsigned long sz) +{ + char *buf = malloc(sz * x * y); + buf[sz * x * y - 1]; // COMPLIANT + buf[sz * x * y]; // NON_COMPLIANT + buf[sz * x * y + 1]; // NON_COMPLIANT +} +#+end_example + +XX: Range analyis won't bound =sz * x * y=, so switch to global value +numbering. Or use hashcons. + +XX: global value numbering finds expressions with the same known value, +independent of structure. + +#+begin_example +... +globalValueNumber(e) = globalValueNumber(sizeExpr) and +e != sizeExpr +... +#+end_example + +XX: hashcons: every value gets a number based on structure. Fails on + +#+begin_example +char *buf = malloc(sz * x * y); +sz = 100; +buf[sz * x * y - 1]; // COMPLIANT +#+end_example + +XX: global value numbering to identify common values as first step, but +for expressions like + +#+begin_example +buf[sz * x * y - 1]; // COMPLIANT +#+end_example + +we have to "evaluate" the expressions -- or at least bound them. + +The final exercise is to implement the =isOffsetOutOfBoundsGVN= +predicate to [...] + + + + From dafd5cc1352e08732dc86034a02ac6b608671aec Mon Sep 17 00:00:00 2001 From: Michael Hohn Date: Wed, 10 May 2023 12:39:14 -0700 Subject: [PATCH 06/28] change to session-based workshop --- .gitignore | 3 +- ...l-workshop-runtime-values-c.code-workspace | 3 +- session/session.org | 957 +++++++++++++----- session/session.ql | 167 +-- 4 files changed, 753 insertions(+), 377 deletions(-) diff --git a/.gitignore b/.gitignore index 3e907c5..18dcd40 100644 --- a/.gitignore +++ b/.gitignore @@ -1,4 +1,5 @@ .cache +*.html *.testproj *.lock.yaml *.lock.yml @@ -6,4 +7,4 @@ cpp-runtime-values-db exercises-tests/*/test.c settings.json -.DS_Store \ No newline at end of file +.DS_Store diff --git a/codeql-workshop-runtime-values-c.code-workspace b/codeql-workshop-runtime-values-c.code-workspace index 23f4f79..8acda77 100644 --- a/codeql-workshop-runtime-values-c.code-workspace +++ b/codeql-workshop-runtime-values-c.code-workspace @@ -5,6 +5,7 @@ } ], "settings": { - "sarif-viewer.connectToGithubCodeScanning": "off" + "sarif-viewer.connectToGithubCodeScanning": "off", + "codeQL.runningQueries.autoSave": true } } \ No newline at end of file diff --git a/session/session.org b/session/session.org index 2877aac..0e3be46 100644 --- a/session/session.org +++ b/session/session.org @@ -2,9 +2,9 @@ :PROPERTIES: :CUSTOM_ID: codeql-workshop--using-data-flow-and-range-analysis-to-find-out-of-bounds-accesses :END: -* Acknowledgements +* Acknowledgments :PROPERTIES: - :CUSTOM_ID: acknowledgements + :CUSTOM_ID: acknowledgments :END: This session-based workshop is based on the exercise/unit-test-based material at @@ -130,33 +130,30 @@ Standards repository]]. :PROPERTIES: :CUSTOM_ID: sessionworkshop-notes :END: -Unlike the the [[#org3b74422][exercises]] which use the /collection/ of -test problems in =exercises-test=, a workshop follows -=session/session.ql= and uses a /single/ database built from a single, -larger segment of code. -* Exercises - :PROPERTIES: - :CUSTOM_ID: exercises - :END: -These exercises use the collection of test problems in =exercises-test=. +Unlike the the [[../README.md#org3b74422][exercises]] which use the /collection/ of test problems in +=exercises-test=, this workshop is a sequential session as one following the +actual process of writing CodeQL: use a /single/ database built from a single, +larger segment of code. For this workshop, the larger segment is still simplified +skeleton code, not a full source code repository. -** Exercise 1 +** Step 1 :PROPERTIES: :CUSTOM_ID: exercise-1 :END: -In the first exercise we are going to start by modelling a dynamic -allocation with =malloc= and an access to that allocated buffer with an -array expression. The goal of this exercise is to then output the array -access, buffer, array size, and buffer offset. - -The [[file:solutions-tests/Exercise1/test.c][first test-case]] is a -simple one, as both the allocation size and array offsets are constants. + In the first step we are going to start by + 1. modelling a dynamic allocation with =malloc= and + 2. an access to that allocated buffer with an + 3. array expression. -For this exercise, connect the allocation(s), the array accesses, and -the sizes in each. + The goal of this exercise is to then output the array access, buffer, array size, + and buffer offset. -Run the query and ensure that you have three results. + The focus here is on + : void test_const(void) + and + : void test_const_var(void) + in [[file:~/local/codeql-workshop-runtime-values-c/session-db/DB/db.c][db.c]]. *** Hints :PROPERTIES: @@ -164,252 +161,702 @@ Run the query and ensure that you have three results. :END: 1. =Expr::getValue()::toInt()= can be used to get the integer value of a constant expression. -2. Use =DataFlow::localExprFlow()= to relate the allocated buffer to the - array base. - -** Exercise 2 - :PROPERTIES: - :CUSTOM_ID: exercise-2 - :END: -This exercise uses the same C source code with an addition: a constant -array size propagated [[file:solutions-tests/Exercise2/test.c][via a -variable]]. - -XX: - -1. start with query. - =elementSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize()= -2. convert to predicate. -3. then use classes, if desired. =class BufferAccess extends ArrayExpr= - is different from those below. - -*** Task 1 - :PROPERTIES: - :CUSTOM_ID: task-1 - :END: -With the basic elements of the analysis in place, refactor the query -into two classes: =AllocationCall= and =ArrayAccess=. The -=AllocationCall= class should model a call to =malloc= and the -=ArrayAccess= class should model an array access expression -(=ArrayExpr=). - -*** Task 2 - :PROPERTIES: - :CUSTOM_ID: task-2 - :END: -Next, note the missing results for the cases in =test_const_var= which -involve a variable access rather than a constant. The goal of this task -is to implement the =getSourceConstantExpr=, =getFixedSize=, and -=getFixedArrayOffset= predicates to handle the case where the allocation -size or array index are variables rather than integer constants. - -Use local data-flow analysis to complete the =getSourceConstantExpr= -predicate. The =getFixedSize= and =getFixedArrayOffset= predicates can -be completed using =getSourceConstantExpr=. - -** Exercise 3 - :PROPERTIES: - :CUSTOM_ID: exercise-3 - :END: -This exercise has slightly more C source code -[[file:solutions-tests/Exercise3/test.c][here]]. - -XX: test_const_branch buf[100] with size == 100 - -Running the query from Exercise 2 against the database yields a -significant number of missing or incorrect results. The reason is that -although great at identifying compile-time constants and their use, -data-flow analysis is not always the right tool for identifying the -/range/ of values an =Expr= might have, particularly when multiple -potential constants might flow to an =Expr=. -XX: explain using source code. +*** Solution + #+BEGIN_SRC java +// Step 1 +// void test_const(void) +// void test_const_var(void) +from AllocationExpr buffer, ArrayExpr access, int bufferSize, int accessIdx, Expr allocSizeExpr +where + // malloc (100) + // ^^^^^^ AllocationExpr buffer + // + // buf[...] + // ^^^ ArrayExpr access + // + // buf[...] + // ^^^ int accessIdx + // + accessIdx = access.getArrayOffset().getValue().toInt() and + // + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // + allocSizeExpr = buffer.(Call).getArgument(0) and + bufferSize = allocSizeExpr.getValue().toInt() +select buffer, access, accessIdx, access.getArrayOffset(), bufferSize, allocSizeExpr + #+END_SRC + +This produces 12 results, with some cross-function pairs. + +** Step 2 +The previous query fails to connect the =malloc= calls with the array accesses, +and =mallocs= from one function are paired with accesses in another. + +To address these, take the query from the previous exercise and +1. connect the allocation(s) with the +2. array accesses -XX: autogen accessor predicates? - -The CodeQL standard library several mechanisms for addressing this -problem; in the remainder of this workshop we will explore two of them: -=SimpleRangeAnalysis= and, later, =GlobalValueNumbering=. - -Although not in the scope of this workshop, a standard use-case for -range analysis is reliably identifying integer overflow and validating -integer overflow checks. - -*** Task 1 - :PROPERTIES: - :CUSTOM_ID: task-1-1 - :END: -Change the implementation of the =getFixedSize= and -=getFixedArrayOffset= predicates to use the =SimpleRangeAnalysis= -library rather than data-flow. Specifically, the relevant predicates are -=upperBound= and =lowerBound=. Decide which to use for this exercise -(=upperBound=, =lowerBound=, or both). - -Experiment with different combinations of the =upperBound= and -=lowerBound= predicates to see how they impact the results. - -Hint: - -Use =upperBound= for both predicates. - -*** Task 2 - :PROPERTIES: - :CUSTOM_ID: task-2-1 - :END: -Implement the =isOffsetOutOfBoundsConstant= predicate to check if the -array offset is out-of-bounds. A template has been provided for you. - -You should now have five results. - -** Exercise 4 - :PROPERTIES: - :CUSTOM_ID: exercise-4 - :END: -XX: The range analysis already handles conditional branches; we don't -have to use guards on data flow -- don't implement your own interpreter -if you can use the library. - -Again, a slight longer C [[file:solutions-tests/Exercise4/test.c][source -snippet]]. - -A common issue with the =SimpleRangeAnalysis= library is handling of -cases where the bounds are undeterminable at compile-time on one or more -paths. For example, even though certain paths have clearly defined -bounds, the range analysis library will define the =upperBound= and -=lowerBound= of =val= as =INT_MIN= and =INT_MAX= respectively: - -#+begin_src cpp -int val = rand() ? rand() : 30; -#+end_src - -A similar case is present in the =test_const_branch= and -=test_const_branch2= test-cases in the =Exercise3= test case. Note the -issues with your Exercise 3 for these test-cases. In these cases, it is -necessary to augment range analysis with data-flow and restrict the -bounds to the upper or lower bound of computable constants that flow to -a given expression. - -*** Task 1 +*** Hints :PROPERTIES: - :CUSTOM_ID: task-1-2 + :CUSTOM_ID: hints :END: -To refine the bounds used for validation, start by implementing -=getSourceConstantExpr=. Then, implement =getMaxStatedValue= according -to the -[[https://codeql.github.com/docs/ql-language-reference/ql-language-specification/#qldoc-qldoc][QLDoc]] -documentation in =Exercise4.ql=. - -** Task 2 - :PROPERTIES: - :CUSTOM_ID: task-2-2 - :END: -Update the =getFixedSize= and =getFixedArrayOffset= predicates to use -the =getMaxStatedValue= predicate. - -You should now have six results. However, some results annotated as -=NON_COMPLIANT= in the test-case are still missing. Why is that? - -Hint: - -Which expression is passed to the =getMaxStatedValue= predicate? - -Answer: - -The missing results involve arithmetic offsets (right operand) from a -base value (left operand). The =getMaxStatedValue= predicate should only -be called on the base expression, not any =AddExpr= or =SubExpr=, as -=getMaxStatedValue= relies on data-flow analysis. - -** Exercise 5 - :PROPERTIES: - :CUSTOM_ID: exercise-5 - :END: -The [[file:solutions-tests/Exercise5/test.c][source snippet]] is -unchanged but replicated for the test. - -XX: the cases -=39:14: if (size < 199) 69:20: if (alloc_size < 199)= need to be -exempted. - -XX: examine the index expression value, and compare it to the -upper/lower bounds. /Then/ expand the query. - -Since we aren't using pure range analysis via the =upperBound= and/or -=lowerBound= predicates, handling =getMaxStatedValue= for =AddExpr= and -=SubExpr= is necessary. - -In the interest of time and deduplicating work in this workshop, only -implement that check in =getFixedArrayOffset=. In a real-world scenario, -it would be necessary to analyze offsets of both the buffer allocation -size and array index. - -Complete the following predicates: - -- =getExprOffsetValue= -- =getFixedArrayOffset= - -You should now see nine results. - -** Exercise 6 +1. Use =DataFlow::localExprFlow()= to relate the allocated buffer to the + array base. +2. The the array base is the =buf= part of =buf[0]=. Use the + =Expr.getArrayBase()= predicate. + +*** Solution + + #+BEGIN_SRC java + import cpp + import semmle.code.cpp.dataflow.DataFlow + + // Step 2 + // void test_const(void) + // void test_const_var(void) + from AllocationExpr buffer, ArrayExpr access, int bufferSize, int accessIdx, Expr allocSizeExpr + where + // malloc (100) + // ^^^^^^ AllocationExpr buffer + // + // buf[...] + // ^^^ ArrayExpr access + // + // buf[...] + // ^^^ int accessIdx + // + accessIdx = access.getArrayOffset().getValue().toInt() and + // + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // + allocSizeExpr = buffer.(Call).getArgument(0) and + bufferSize = allocSizeExpr.getValue().toInt() and + // char *buf = ... buf[0]; + // ^^^ ---> ^^^ + // or + // malloc(100); buf[0] + // ^^^ --------> ^^^ + // + DataFlow::localExprFlow(buffer, access.getArrayBase()) + select buffer, access, accessIdx, access.getArrayOffset(), bufferSize, allocSizeExpr + #+END_SRC + +*** Results + There are now 3 results. These are from only one function, the one using constants. + +** Step 3 :PROPERTIES: - :CUSTOM_ID: exercise-6 + :CUSTOM_ID: exercise-2 :END: -TODO: intro to GVN write-up here TODO: finish below instructions - -XX: reference: -[[https://codeql.github.com/docs/codeql-language-guides/hash-consing-and-value-numbering/]] -Global value numbering only knows that runtime values are equal; they -are not comparable (=<, >, <== etc.), and the /actual/ value is not -known. - -XX: Look for and use /relative/ values between allocation and use. To do -this, use GVN. - -XX: This is the case in - -#+begin_example -void test_gvn_var(unsigned long x, unsigned long y, unsigned long sz) -{ - char *buf = malloc(sz * x * y); - buf[sz * x * y - 1]; // COMPLIANT - buf[sz * x * y]; // NON_COMPLIANT - buf[sz * x * y + 1]; // NON_COMPLIANT -} -#+end_example - -XX: Range analyis won't bound =sz * x * y=, so switch to global value -numbering. Or use hashcons. - -XX: global value numbering finds expressions with the same known value, -independent of structure. - -#+begin_example -... -globalValueNumber(e) = globalValueNumber(sizeExpr) and -e != sizeExpr -... -#+end_example - -XX: hashcons: every value gets a number based on structure. Fails on + The previous results need to be extended to the case + #+BEGIN_SRC c++ + void test_const_var(void) + { + unsigned long size = 100; + char *buf = malloc(size); + buf[0]; // COMPLIANT + ... + } + #+END_SRC + + Here, the =malloc= argument is a variable with known value. + + We include this result by removing the size-retrieval from the prior query. + +*** Solution + + #+BEGIN_SRC java + + import cpp + import semmle.code.cpp.dataflow.DataFlow + + // Step 3 + // void test_const_var(void) + from AllocationExpr buffer, ArrayExpr access, int accessIdx, Expr allocSizeExpr + where + // malloc (100) + // ^^^^^^ AllocationExpr buffer + // + // buf[...] + // ^^^ ArrayExpr access + // + // buf[...] + // ^^^ int accessIdx + // + accessIdx = access.getArrayOffset().getValue().toInt() and + // + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // + allocSizeExpr = buffer.(Call).getArgument(0) and + // bufferSize = allocSizeExpr.getValue().toInt() and + // char *buf = ... buf[0]; + // ^^^ ---> ^^^ + // or + // malloc(100); buf[0] + // ^^^ --------> ^^^ + // + DataFlow::localExprFlow(buffer, access.getArrayBase()) + select buffer, access, accessIdx, access.getArrayOffset() + #+END_SRC + +*** Results + Now, we get 12 results, including some from other test cases. + +** Step 4 + We are looking for out-of-bounds accesses, so we to need to include the + bounds. But in a different way. + + Note the results for the cases in =test_const_var= which involve a variable + access rather than a constant. The next goal is to handle the case where the + allocation size or array index are variables (with constant values) rather than + integer constants. + + We have an expression =size= that flows into the =malloc()= call. + +*** Solution + + #+BEGIN_SRC java + import cpp + import semmle.code.cpp.dataflow.DataFlow + + // Step 4 + from AllocationExpr buffer, ArrayExpr access, int accessIdx, Expr allocSizeExpr, int bufferSize, Expr bse + where + // malloc (100) + // ^^^^^^^^^^^^ AllocationExpr buffer + // + // buf[...] + // ^^^ ArrayExpr access + // + // buf[...] + // ^^^ int accessIdx + // + accessIdx = access.getArrayOffset().getValue().toInt() and + // + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // + allocSizeExpr = buffer.(Call).getArgument(0) and + // bufferSize = allocSizeExpr.getValue().toInt() and + // + // unsigned long size = 100; + // ... + // char *buf = malloc(size); + exists(Expr bufferSizeExpr | + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + bufferSizeExpr.getValue().toInt() = bufferSize + and bse = bufferSizeExpr + ) and + // char *buf = ... buf[0]; + // ^^^ ---> ^^^ + // or + // malloc(100); buf[0] + // ^^^ --------> ^^^ + // + DataFlow::localExprFlow(buffer, access.getArrayBase()) + select buffer, access, accessIdx, access.getArrayOffset(), bufferSize, bse + #+END_SRC + +*** Results + Now, we get 15 results, limited to statically determined values. + + XX: to implement predicates + =getSourceConstantExpr=, =getFixedSize=, and =getFixedArrayOffset= + Use local data-flow analysis to complete the =getSourceConstantExpr= + predicate. The =getFixedSize= and =getFixedArrayOffset= predicates can + be completed using =getSourceConstantExpr=. + +XX: + 1. start with query. + =elementSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize()= + 2. convert to predicate. + 3. then use classes, if desired. =class BufferAccess extends ArrayExpr= + is different from those below. + + +** Step 5 -- SimpleRangeAnalysis + Running the query from Step 2 against the database yields a + significant number of missing or incorrect results. The reason is that + although great at identifying compile-time constants and their use, + data-flow analysis is not always the right tool for identifying the + /range/ of values an =Expr= might have, particularly when multiple + potential constants might flow to an =Expr=. + + The range analysis already handles conditional branches; we don't + have to use guards on data flow -- don't implement your own interpreter + if you can use the library. + + The CodeQL standard library has several mechanisms for addressing this + problem; in the remainder of this workshop we will explore two of them: + =SimpleRangeAnalysis= and, later, =GlobalValueNumbering=. + + Although not in the scope of this workshop, a standard use-case for + range analysis is reliably identifying integer overflow and validating + integer overflow checks. + + First, simplify the =from...where...select=: + 1. Remove unnecessary =exists= clauses. + 2. Use =DataFlow::localExprFlow= for the buffer and allocation sizes, not + =getValue().toInt()= + + Then, add the use of the =SimpleRangeAnalysis= library. Specifically, the + relevant library predicates are =upperBound= and =lowerBound=, to be used with + the buffer access argument. Experiment and decide which to use for this + exercise (=upperBound=, =lowerBound=, or both). + + This requires the import + : import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis + +*** Solution + #+BEGIN_SRC java + import cpp + import semmle.code.cpp.dataflow.DataFlow + import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis + + // Step 5 + from + AllocationExpr buffer, ArrayExpr access, Expr accessIdx, Expr allocSizeExpr, int bufferSize, + int allocsize, Expr bufferSizeExpr + where + // malloc (100) + // ^^^^^^^^^^^^ AllocationExpr buffer + // + // buf[...] + // ^^^ ArrayExpr access + // + // buf[...] + // ^^^ int accessIdx + // + accessIdx = access.getArrayOffset() and + // + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // + // Not really: + // allocSizeExpr = buffer.(Call).getArgument(0) and + // + DataFlow::localExprFlow(allocSizeExpr, buffer.(Call).getArgument(0)) and + allocsize = allocSizeExpr.getValue().toInt() and + // + // unsigned long size = 100; + // ... + // char *buf = malloc(size); + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + bufferSizeExpr.getValue().toInt() = bufferSize and + // char *buf = ... buf[0]; + // ^^^ ---> ^^^ + // or + // malloc(100); buf[0] + // ^^^ --------> ^^^ + // + DataFlow::localExprFlow(buffer, access.getArrayBase()) + select buffer, bufferSizeExpr, access, upperBound(accessIdx) as accessMax, accessIdx, allocsize, allocSizeExpr + #+END_SRC + +*** Results + Now, we get 48 results. + +** Step 6 + To finally determine (some) out-of-bounds accesses, we have to convert + allocation units (usually in bytes) to size units. Then we are finally in a + position to compare buffer allocation size to the access index to find + out-of-bounds accesses -- at least for expressions with known values. + + Add these to the query: + 1. Convert allocation units to size units. + 2. Convert access units to the same size units. + + Hints: + 1. We need the size of the array element. Use + =access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType()= + to see the type and + =access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize()= + to get its size. + + 2. Note from the docs: + /The malloc() function allocates size bytes of memory and returns a pointer + to the allocated memory./ + So =size = 1= + + 3. Note that + =allocSizeExpr.getUnspecifiedType() as allocBaseType= + is wrong here. + + 4. These test cases all use type =char=. What would happen for =int= or + =double=? + +*** Solution + #+BEGIN_SRC java + import cpp + import semmle.code.cpp.dataflow.DataFlow + import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis + + // Step 6 + from + AllocationExpr buffer, ArrayExpr access, Expr accessIdx, Expr allocSizeExpr, int bufferSize, + int allocsize, Expr bufferSizeExpr + where + // malloc (100) + // ^^^^^^^^^^^^ AllocationExpr buffer + // + // buf[...] + // ^^^ ArrayExpr access + // buf[...] + // ^^^ int accessIdx + accessIdx = access.getArrayOffset() and + // + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // + // Not really: + // allocSizeExpr = buffer.(Call).getArgument(0) and + // + DataFlow::localExprFlow(allocSizeExpr, buffer.(Call).getArgument(0)) and + allocsize = allocSizeExpr.getValue().toInt() and + // + // unsigned long size = 100; + // ... + // char *buf = malloc(size); + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + bufferSizeExpr.getValue().toInt() = bufferSize and + // char *buf = ... buf[0]; + // ^^^ ---> ^^^ + // or + // malloc(100); buf[0] + // ^^^ --------> ^^^ + // + DataFlow::localExprFlow(buffer, access.getArrayBase()) + select buffer, bufferSizeExpr, access, upperBound(accessIdx) as accessMax, accessIdx, allocsize, allocSizeExpr, access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType() as arrayBaseType, access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() as arrayTypeSize, 1 as allocBaseSize + #+END_SRC + +*** Results + 48 results in the table + + | 1 | call to malloc | 200 | access to array | 0 | 0 | 200 | 200 | char | 1 | 1 | + +** Step 7 + 1. Clean up the query. + 2. Add expressions for =allocatedUnits= (from the malloc) and a + =maxAccessedIndex= (from array accesses) + 3. Compare buffer allocation size to the access index. + +*** Solution: + #+BEGIN_SRC java + import cpp + import semmle.code.cpp.dataflow.DataFlow + import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis + + // Step 7 + from + AllocationExpr buffer, ArrayExpr access, Expr accessIdx, Expr allocSizeExpr, int bufferSize, + int allocsize, Expr bufferSizeExpr, int arrayTypeSize, int allocBaseSize + where + // malloc (100) + // ^^^^^^^^^^^^ AllocationExpr buffer + // + // buf[...] + // ^^^ ArrayExpr access + // buf[...] + // ^^^ int accessIdx + accessIdx = access.getArrayOffset() and + // + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // + // Not really: + // allocSizeExpr = buffer.(Call).getArgument(0) and + // + DataFlow::localExprFlow(allocSizeExpr, buffer.(Call).getArgument(0)) and + allocsize = allocSizeExpr.getValue().toInt() and + // + // unsigned long size = 100; + // ... + // char *buf = malloc(size); + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + bufferSizeExpr.getValue().toInt() = bufferSize and + // char *buf = ... buf[0]; + // ^^^ ---> ^^^ + // or + // malloc(100); buf[0] + // ^^^ --------> ^^^ + // + arrayTypeSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() + and + 1 = allocBaseSize + and + DataFlow::localExprFlow(buffer, access.getArrayBase()) + select buffer, bufferSizeExpr, access, upperBound(accessIdx) as accessMax, allocSizeExpr, allocBaseSize * allocsize as allocatedUnits, arrayTypeSize * accessMax as maxAccessedIndex + #+END_SRC + +*** Results + 48 results in the much cleaner table + + | no. | buffer | bufferSizeExpr | access | accessMax | allocSizeExpr | allocatedUnits | maxAccessedIndex | | + | 1 | call to malloc | 200 | access to array | 0 | 200 | 200 | 0 | | + +** Step 8 + 1. Clean up the query. + 2. Compare buffer allocation size to the access index. + 3. Report only the questionable entries. + 4. Use + #+BEGIN_SRC java + /** + ,* @kind problem + ,*/ + #+END_SRC + to get nicer reporting. + +*** Solution: + #+BEGIN_SRC java + /** + ,* @kind problem + ,*/ + + import cpp + import semmle.code.cpp.dataflow.DataFlow + import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis + + // Step 8 + from + AllocationExpr buffer, ArrayExpr access, Expr accessIdx, Expr allocSizeExpr, int bufferSize, + int allocsize, Expr bufferSizeExpr, int arrayTypeSize, int allocBaseSize, int accessMax, + int allocatedUnits, int maxAccessedIndex + where + // malloc (100) + // ^^^^^^^^^^^^ AllocationExpr buffer + // + // buf[...] + // ^^^ ArrayExpr access + // buf[...] + // ^^^ int accessIdx + accessIdx = access.getArrayOffset() and + // + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // + // Not really: + // allocSizeExpr = buffer.(Call).getArgument(0) and + // + DataFlow::localExprFlow(allocSizeExpr, buffer.(Call).getArgument(0)) and + allocsize = allocSizeExpr.getValue().toInt() and + // + // unsigned long size = 100; + // ... + // char *buf = malloc(size); + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + bufferSizeExpr.getValue().toInt() = bufferSize and + // char *buf = ... buf[0]; + // ^^^ ---> ^^^ + // or + // malloc(100); buf[0] + // ^^^ --------> ^^^ + // + arrayTypeSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() and + 1 = allocBaseSize and + DataFlow::localExprFlow(buffer, access.getArrayBase()) and + upperBound(accessIdx) = accessMax and + allocBaseSize * allocsize = allocatedUnits and + arrayTypeSize * accessMax = maxAccessedIndex and + // only consider out-of-bounds + maxAccessedIndex >= allocatedUnits + select access, "Array access at or beyond size; have "+allocatedUnits + " units, access at "+ maxAccessedIndex + #+END_SRC + +*** Results + 14 results in the much cleaner table + + | Array access at or beyond size; have 200 units, access at 200 | db.c:67:5 | + +** Interim notes + A common issue with the =SimpleRangeAnalysis= library is handling of + cases where the bounds are undeterminable at compile-time on one or more + paths. For example, even though certain paths have clearly defined + bounds, the range analysis library will define the =upperBound= and + =lowerBound= of =val= as =INT_MIN= and =INT_MAX= respectively: + + #+begin_src cpp + int val = rand() ? rand() : 30; + #+end_src + + A similar case is present in the =test_const_branch= and =test_const_branch2= + test-cases. In these cases, it is necessary to augment range analysis with + data-flow and restrict the bounds to the upper or lower bound of computable + constants that flow to a given expression. Another approach is global value + numbering, used next. + +** Step 9 -- GlobalValueNumbering + Range analyis won't bound =sz * x * y=, so switch to global value + numbering. + This is the case in the last test case, + #+begin_example + void test_gvn_var(unsigned long x, unsigned long y, unsigned long sz) + { + char *buf = malloc(sz * x * y); + buf[sz * x * y - 1]; // COMPLIANT + buf[sz * x * y]; // NON_COMPLIANT + buf[sz * x * y + 1]; // NON_COMPLIANT + } + #+end_example + + Reference: + [[https://codeql.github.com/docs/codeql-language-guides/hash-consing-and-value-numbering/]] + + Global value numbering only knows that runtime values are equal; they + are not comparable (=<, >, <== etc.), and the /actual/ value is not + known. + + XX: global value numbering finds expressions with the same known value, + independent of structure. + + So, we look for and use /relative/ values between allocation and use. To do + this, use GVN. + + The relevant CodeQL constructs are + #+BEGIN_SRC java + import semmle.code.cpp.valuenumbering.GlobalValueNumbering + ... + globalValueNumber(e) = globalValueNumber(sizeExpr) and + e != sizeExpr + ... + #+END_SRC + + We can use global value numbering to identify common values as first step, but + for expressions like + #+begin_example + buf[sz * x * y - 1]; // COMPLIANT + #+end_example + we have to "evaluate" the expressions -- or at least bound them. + +*** interim + #+BEGIN_SRC java + /** + ,* @ kind problem + ,*/ + + import cpp + import semmle.code.cpp.dataflow.DataFlow + import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis + import semmle.code.cpp.valuenumbering.GlobalValueNumbering + + // Step 9 + from + AllocationExpr buffer, ArrayExpr access, Expr accessIdx, Expr allocSizeExpr, int bufferSize, + int allocsize, Expr bufferSizeExpr, int arrayTypeSize, int allocBaseSize, int accessMax, + int allocatedUnits, int maxAccessedIndex + where + // malloc (100) + // ^^^^^^^^^^^^ AllocationExpr buffer + // + // buf[...] + // ^^^ ArrayExpr access + // buf[...] + // ^^^ int accessIdx + accessIdx = access.getArrayOffset() and + // + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // + // Not really: + // allocSizeExpr = buffer.(Call).getArgument(0) and + // + DataFlow::localExprFlow(allocSizeExpr, buffer.(Call).getArgument(0)) and + allocsize = allocSizeExpr.getValue().toInt() and + // + // unsigned long size = 100; + // ... + // char *buf = malloc(size); + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + bufferSizeExpr.getValue().toInt() = bufferSize and + // char *buf = ... buf[0]; + // ^^^ ---> ^^^ + // or + // malloc(100); buf[0] + // ^^^ --------> ^^^ + // + arrayTypeSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() and + 1 = allocBaseSize and + DataFlow::localExprFlow(buffer, access.getArrayBase()) and + upperBound(accessIdx) = accessMax and + allocBaseSize * allocsize = allocatedUnits and + arrayTypeSize * accessMax = maxAccessedIndex and + // only consider out-of-bounds + maxAccessedIndex >= allocatedUnits + select access, + "Array access at or beyond size; have " + allocatedUnits + " units, access at " + maxAccessedIndex, + globalValueNumber(accessIdx) as gvnAccess, globalValueNumber(allocSizeExpr) as gvnAlloc + #+END_SRC + +*** interim + Messy, start over. + + #+BEGIN_SRC java + + import cpp + import semmle.code.cpp.dataflow.DataFlow + import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis + import semmle.code.cpp.valuenumbering.GlobalValueNumbering + + // Step 9 + from + AllocationExpr buffer, ArrayExpr access, Expr accessIdx, Expr allocSizeExpr, GVN gvnAccess, + GVN gvnAlloc + where + // malloc (100) + // ^^^^^^^^^^^^ AllocationExpr buffer + // + // buf[...] + // ^^^ ArrayExpr access + // buf[...] + // ^^^ accessIdx + accessIdx = access.getArrayOffset() and + // + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // unsigned long size = 100; + // ... + // char *buf = malloc(size); + DataFlow::localExprFlow(allocSizeExpr, buffer.getSizeExpr()) and + // char *buf = ... buf[0]; + // ^^^ ---> ^^^ + // or + // malloc(100); buf[0] + // ^^^ --------> ^^^ + // + DataFlow::localExprFlow(buffer, access.getArrayBase()) and + // + // Use GVN + globalValueNumber(accessIdx) = gvnAccess and + globalValueNumber(allocSizeExpr) = gvnAlloc and + ( + gvnAccess = gvnAlloc + or + // buf[sz * x * y] above + // buf[sz * x * y + 1]; + exists(AddExpr add | + accessIdx = add and + // add.getAnOperand() = accessIdx and + add.getAnOperand().getValue().toInt() > 0 and + globalValueNumber(add.getAnOperand()) = gvnAlloc + ) + ) + select access, gvnAccess, gvnAlloc + #+END_SRC + +** TODO hashcons +import semmle.code.cpp.valuenumbering.HashCons + +hashcons: every value gets a number based on structure. Fails on #+begin_example char *buf = malloc(sz * x * y); sz = 100; buf[sz * x * y - 1]; // COMPLIANT #+end_example -XX: global value numbering to identify common values as first step, but -for expressions like - -#+begin_example -buf[sz * x * y - 1]; // COMPLIANT -#+end_example - -we have to "evaluate" the expressions -- or at least bound them. The final exercise is to implement the =isOffsetOutOfBoundsGVN= predicate to [...] - - - - diff --git a/session/session.ql b/session/session.ql index 9066819..bc98c07 100644 --- a/session/session.ql +++ b/session/session.ql @@ -3,124 +3,51 @@ */ import cpp - -// Ex.1 -// void test_const(void) -// void test_const_var(void) - -from AllocationExpr buffer, ArrayExpr access, int bufferSize, int accessIdx, Expr allocSizeExpr -where - // malloc (100) - // ^^^^^^ in the AllocationExpr buffer - - // buf[...] - // ^^^ ArrayExpr access - - accessIdx = access.getArrayOffset().getValue().toInt() and - // malloc (100) - // ^^^ - allocSizeExpr.getValue().toInt() = bufferSize -select buffer, access, accessIdx - -// from AllocationExpr buffer, ArrayExpr access, int bufferSize, int accessOffset, int accessIdx, int elementSize, Expr allocSizeExpr -// where -// // malloc (100) -// // ^^^^^^ in the AllocationExpr buffer - -// // buf[...] -// // ^^^ ArrayExpr access - -// accessIdx = access.getArrayOffset().getValue().toInt() and -// elementSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() and -// accessOffset = accessIdx * elementSize and -// // malloc (100) -// // ^^^ -// allocSizeExpr.getValue().toInt() = bufferSize -// select buffer, access - - -/* - * char *buf = malloc(100); - * buf[0]; // COMPLIANT - * buf[99]; // COMPLIANT - * buf[100]; // NON_COMPLIANT - * - * #define FACTOR 2 - * ... - * unsigned long size = 100 * FACTOR; - * char *buf = malloc(size); - * buf[0]; // COMPLIANT - * buf[99]; // COMPLIANT - * buf[size - 1]; // COMPLIANT - * buf[100]; // NON_COMPLIANT - * buf[size]; // NON_COMPLIANT - */ - import semmle.code.cpp.dataflow.DataFlow - -class BufferAccess extends ArrayExpr { - AllocationExpr buffer; - int bufferSize; - Expr offsetExpr; - BufferAccess() { - exists(Expr allocSizeExpr | - DataFlow::localExprFlow(buffer, this.getArrayBase()) and - offsetExpr = this.getArrayOffset() and - allocSizeExpr.getValue().toInt() = bufferSize and - DataFlow::localExprFlow(allocSizeExpr, buffer.getSizeExpr())) - } - - AllocationExpr getBuffer() { - result = buffer - } - - Expr getAccessExpr() { - result = offsetExpr - } - - int getBufferSize() { - result = bufferSize - } -} - -// predicate bufferAccess(AllocationExpr buffer, ArrayExpr access, int bufferSize, int accessOffset) { -// exists(int accessIdx, int elementSize, Expr allocSizeExpr | -// DataFlow::localExprFlow(buffer, access.getArrayBase()) and -// accessIdx = access.getArrayOffset().getValue().toInt() and -// elementSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() and -// accessOffset = accessIdx * elementSize and -// allocSizeExpr.getValue().toInt() = bufferSize and -// DataFlow::localExprFlow(allocSizeExpr, buffer.getSizeExpr()) -// ) -// } - - - -// from BufferAccess ba, int accessOffset, int bufferSize -// where upperBound(ba.getAccessExpr()) = accessOffset and -// bufferSize = ba.getBufferSize() and -// accessOffset >= bufferSize -// select ba, "Possible out of bounds access with offset " + accessOffset + " and size " + bufferSize - -// from AllocationExpr alloc, ArrayExpr access, Expr sizeExpr, Expr partOfAccess -// where alloc.getSizeExpr() = sizeExpr and -// ( -// // malloc(sz * x * y); -// // ... -// // buf[sz * x * y]; -// access.getArrayOffset() = partOfAccess -// or -// // buf[sz * x * y + 1]; -// exists(AddExpr add | -// access.getArrayOffset() = add and -// add.getAnOperand() = partOfAccess and -// add.getAnOperand().getValue().toInt() > 0 -// ) -// ) -// and -// partOfAccess != sizeExpr and -// globalValueNumber(partOfAccess) = globalValueNumber(sizeExpr) -// select sizeExpr, partOfAccess - -// import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis -// import semmle.code.cpp.valuenumbering.GlobalValueNumbering +import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis +import semmle.code.cpp.valuenumbering.GlobalValueNumbering + +// Step 9 +from + AllocationExpr buffer, ArrayExpr access, Expr accessIdx, Expr allocSizeExpr, GVN gvnAccess, + GVN gvnAlloc +where + // malloc (100) + // ^^^^^^^^^^^^ AllocationExpr buffer + // + // buf[...] + // ^^^ ArrayExpr access + // buf[...] + // ^^^ accessIdx + accessIdx = access.getArrayOffset() and + // + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // unsigned long size = 100; + // ... + // char *buf = malloc(size); + DataFlow::localExprFlow(allocSizeExpr, buffer.getSizeExpr()) and + // char *buf = ... buf[0]; + // ^^^ ---> ^^^ + // or + // malloc(100); buf[0] + // ^^^ --------> ^^^ + // + DataFlow::localExprFlow(buffer, access.getArrayBase()) and + // + // Use GVN + globalValueNumber(accessIdx) = gvnAccess and + globalValueNumber(allocSizeExpr) = gvnAlloc and + ( + gvnAccess = gvnAlloc + or + // buf[sz * x * y] above + // buf[sz * x * y + 1]; + exists(AddExpr add | + accessIdx = add and + // add.getAnOperand() = accessIdx and + add.getAnOperand().getValue().toInt() > 0 and + globalValueNumber(add.getAnOperand()) = gvnAlloc + ) + ) +select access, gvnAccess, gvnAlloc From 079b7d37173684b7049aaf114399702784763ed2 Mon Sep 17 00:00:00 2001 From: Michael Hohn Date: Thu, 11 May 2023 17:15:58 -0700 Subject: [PATCH 07/28] Tangled (extracted) the embedded examples into separate files --- session/example1.ql | 22 +++++++++++++++++++ session/example2.ql | 32 ++++++++++++++++++++++++++++ session/example3.ql | 31 +++++++++++++++++++++++++++ session/example4.ql | 39 ++++++++++++++++++++++++++++++++++ session/example5.ql | 42 ++++++++++++++++++++++++++++++++++++ session/example6.ql | 40 ++++++++++++++++++++++++++++++++++ session/example7.ql | 44 ++++++++++++++++++++++++++++++++++++++ session/example8.ql | 52 +++++++++++++++++++++++++++++++++++++++++++++ session/session.org | 16 +++++++------- 9 files changed, 310 insertions(+), 8 deletions(-) create mode 100644 session/example1.ql create mode 100644 session/example2.ql create mode 100644 session/example3.ql create mode 100644 session/example4.ql create mode 100644 session/example5.ql create mode 100644 session/example6.ql create mode 100644 session/example7.ql create mode 100644 session/example8.ql diff --git a/session/example1.ql b/session/example1.ql new file mode 100644 index 0000000..a0e67e4 --- /dev/null +++ b/session/example1.ql @@ -0,0 +1,22 @@ +// Step 1 +// void test_const(void) +// void test_const_var(void) +from AllocationExpr buffer, ArrayExpr access, int bufferSize, int accessIdx, Expr allocSizeExpr +where + // malloc (100) + // ^^^^^^ AllocationExpr buffer + // + // buf[...] + // ^^^ ArrayExpr access + // + // buf[...] + // ^^^ int accessIdx + // + accessIdx = access.getArrayOffset().getValue().toInt() and + // + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // + allocSizeExpr = buffer.(Call).getArgument(0) and + bufferSize = allocSizeExpr.getValue().toInt() +select buffer, access, accessIdx, access.getArrayOffset(), bufferSize, allocSizeExpr diff --git a/session/example2.ql b/session/example2.ql new file mode 100644 index 0000000..27c2eb4 --- /dev/null +++ b/session/example2.ql @@ -0,0 +1,32 @@ +import cpp +import semmle.code.cpp.dataflow.DataFlow + +// Step 2 +// void test_const(void) +// void test_const_var(void) +from AllocationExpr buffer, ArrayExpr access, int bufferSize, int accessIdx, Expr allocSizeExpr +where + // malloc (100) + // ^^^^^^ AllocationExpr buffer + // + // buf[...] + // ^^^ ArrayExpr access + // + // buf[...] + // ^^^ int accessIdx + // + accessIdx = access.getArrayOffset().getValue().toInt() and + // + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // + allocSizeExpr = buffer.(Call).getArgument(0) and + bufferSize = allocSizeExpr.getValue().toInt() and + // char *buf = ... buf[0]; + // ^^^ ---> ^^^ + // or + // malloc(100); buf[0] + // ^^^ --------> ^^^ + // + DataFlow::localExprFlow(buffer, access.getArrayBase()) +select buffer, access, accessIdx, access.getArrayOffset(), bufferSize, allocSizeExpr diff --git a/session/example3.ql b/session/example3.ql new file mode 100644 index 0000000..f7dfc98 --- /dev/null +++ b/session/example3.ql @@ -0,0 +1,31 @@ +import cpp +import semmle.code.cpp.dataflow.DataFlow + +// Step 3 +// void test_const_var(void) +from AllocationExpr buffer, ArrayExpr access, int accessIdx, Expr allocSizeExpr +where + // malloc (100) + // ^^^^^^ AllocationExpr buffer + // + // buf[...] + // ^^^ ArrayExpr access + // + // buf[...] + // ^^^ int accessIdx + // + accessIdx = access.getArrayOffset().getValue().toInt() and + // + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // + allocSizeExpr = buffer.(Call).getArgument(0) and + // bufferSize = allocSizeExpr.getValue().toInt() and + // char *buf = ... buf[0]; + // ^^^ ---> ^^^ + // or + // malloc(100); buf[0] + // ^^^ --------> ^^^ + // + DataFlow::localExprFlow(buffer, access.getArrayBase()) +select buffer, access, accessIdx, access.getArrayOffset() diff --git a/session/example4.ql b/session/example4.ql new file mode 100644 index 0000000..8cdff66 --- /dev/null +++ b/session/example4.ql @@ -0,0 +1,39 @@ +import cpp +import semmle.code.cpp.dataflow.DataFlow + +// Step 4 +from AllocationExpr buffer, ArrayExpr access, int accessIdx, Expr allocSizeExpr, int bufferSize, Expr bse +where + // malloc (100) + // ^^^^^^^^^^^^ AllocationExpr buffer + // + // buf[...] + // ^^^ ArrayExpr access + // + // buf[...] + // ^^^ int accessIdx + // + accessIdx = access.getArrayOffset().getValue().toInt() and + // + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // + allocSizeExpr = buffer.(Call).getArgument(0) and + // bufferSize = allocSizeExpr.getValue().toInt() and + // + // unsigned long size = 100; + // ... + // char *buf = malloc(size); + exists(Expr bufferSizeExpr | + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + bufferSizeExpr.getValue().toInt() = bufferSize + and bse = bufferSizeExpr + ) and + // char *buf = ... buf[0]; + // ^^^ ---> ^^^ + // or + // malloc(100); buf[0] + // ^^^ --------> ^^^ + // + DataFlow::localExprFlow(buffer, access.getArrayBase()) +select buffer, access, accessIdx, access.getArrayOffset(), bufferSize, bse diff --git a/session/example5.ql b/session/example5.ql new file mode 100644 index 0000000..fccc1ae --- /dev/null +++ b/session/example5.ql @@ -0,0 +1,42 @@ +import cpp +import semmle.code.cpp.dataflow.DataFlow +import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis + +// Step 5 +from + AllocationExpr buffer, ArrayExpr access, Expr accessIdx, Expr allocSizeExpr, int bufferSize, + int allocsize, Expr bufferSizeExpr +where + // malloc (100) + // ^^^^^^^^^^^^ AllocationExpr buffer + // + // buf[...] + // ^^^ ArrayExpr access + // + // buf[...] + // ^^^ int accessIdx + // + accessIdx = access.getArrayOffset() and + // + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // + // Not really: + // allocSizeExpr = buffer.(Call).getArgument(0) and + // + DataFlow::localExprFlow(allocSizeExpr, buffer.(Call).getArgument(0)) and + allocsize = allocSizeExpr.getValue().toInt() and + // + // unsigned long size = 100; + // ... + // char *buf = malloc(size); + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + bufferSizeExpr.getValue().toInt() = bufferSize and + // char *buf = ... buf[0]; + // ^^^ ---> ^^^ + // or + // malloc(100); buf[0] + // ^^^ --------> ^^^ + // + DataFlow::localExprFlow(buffer, access.getArrayBase()) +select buffer, bufferSizeExpr, access, upperBound(accessIdx) as accessMax, accessIdx, allocsize, allocSizeExpr diff --git a/session/example6.ql b/session/example6.ql new file mode 100644 index 0000000..4dc1e5e --- /dev/null +++ b/session/example6.ql @@ -0,0 +1,40 @@ +import cpp +import semmle.code.cpp.dataflow.DataFlow +import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis + +// Step 6 +from + AllocationExpr buffer, ArrayExpr access, Expr accessIdx, Expr allocSizeExpr, int bufferSize, + int allocsize, Expr bufferSizeExpr +where + // malloc (100) + // ^^^^^^^^^^^^ AllocationExpr buffer + // + // buf[...] + // ^^^ ArrayExpr access + // buf[...] + // ^^^ int accessIdx + accessIdx = access.getArrayOffset() and + // + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // + // Not really: + // allocSizeExpr = buffer.(Call).getArgument(0) and + // + DataFlow::localExprFlow(allocSizeExpr, buffer.(Call).getArgument(0)) and + allocsize = allocSizeExpr.getValue().toInt() and + // + // unsigned long size = 100; + // ... + // char *buf = malloc(size); + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + bufferSizeExpr.getValue().toInt() = bufferSize and + // char *buf = ... buf[0]; + // ^^^ ---> ^^^ + // or + // malloc(100); buf[0] + // ^^^ --------> ^^^ + // + DataFlow::localExprFlow(buffer, access.getArrayBase()) +select buffer, bufferSizeExpr, access, upperBound(accessIdx) as accessMax, accessIdx, allocsize, allocSizeExpr, access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType() as arrayBaseType, access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() as arrayTypeSize, 1 as allocBaseSize diff --git a/session/example7.ql b/session/example7.ql new file mode 100644 index 0000000..6ebb29d --- /dev/null +++ b/session/example7.ql @@ -0,0 +1,44 @@ +import cpp +import semmle.code.cpp.dataflow.DataFlow +import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis + +// Step 7 +from + AllocationExpr buffer, ArrayExpr access, Expr accessIdx, Expr allocSizeExpr, int bufferSize, + int allocsize, Expr bufferSizeExpr, int arrayTypeSize, int allocBaseSize +where + // malloc (100) + // ^^^^^^^^^^^^ AllocationExpr buffer + // + // buf[...] + // ^^^ ArrayExpr access + // buf[...] + // ^^^ int accessIdx + accessIdx = access.getArrayOffset() and + // + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // + // Not really: + // allocSizeExpr = buffer.(Call).getArgument(0) and + // + DataFlow::localExprFlow(allocSizeExpr, buffer.(Call).getArgument(0)) and + allocsize = allocSizeExpr.getValue().toInt() and + // + // unsigned long size = 100; + // ... + // char *buf = malloc(size); + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + bufferSizeExpr.getValue().toInt() = bufferSize and + // char *buf = ... buf[0]; + // ^^^ ---> ^^^ + // or + // malloc(100); buf[0] + // ^^^ --------> ^^^ + // + arrayTypeSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() + and + 1 = allocBaseSize + and + DataFlow::localExprFlow(buffer, access.getArrayBase()) +select buffer, bufferSizeExpr, access, upperBound(accessIdx) as accessMax, allocSizeExpr, allocBaseSize * allocsize as allocatedUnits, arrayTypeSize * accessMax as maxAccessedIndex diff --git a/session/example8.ql b/session/example8.ql new file mode 100644 index 0000000..ec2b598 --- /dev/null +++ b/session/example8.ql @@ -0,0 +1,52 @@ +/** + * @kind problem + */ + +import cpp +import semmle.code.cpp.dataflow.DataFlow +import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis + +// Step 8 +from + AllocationExpr buffer, ArrayExpr access, Expr accessIdx, Expr allocSizeExpr, int bufferSize, + int allocsize, Expr bufferSizeExpr, int arrayTypeSize, int allocBaseSize, int accessMax, + int allocatedUnits, int maxAccessedIndex +where + // malloc (100) + // ^^^^^^^^^^^^ AllocationExpr buffer + // + // buf[...] + // ^^^ ArrayExpr access + // buf[...] + // ^^^ int accessIdx + accessIdx = access.getArrayOffset() and + // + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // + // Not really: + // allocSizeExpr = buffer.(Call).getArgument(0) and + // + DataFlow::localExprFlow(allocSizeExpr, buffer.(Call).getArgument(0)) and + allocsize = allocSizeExpr.getValue().toInt() and + // + // unsigned long size = 100; + // ... + // char *buf = malloc(size); + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + bufferSizeExpr.getValue().toInt() = bufferSize and + // char *buf = ... buf[0]; + // ^^^ ---> ^^^ + // or + // malloc(100); buf[0] + // ^^^ --------> ^^^ + // + arrayTypeSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() and + 1 = allocBaseSize and + DataFlow::localExprFlow(buffer, access.getArrayBase()) and + upperBound(accessIdx) = accessMax and + allocBaseSize * allocsize = allocatedUnits and + arrayTypeSize * accessMax = maxAccessedIndex and + // only consider out-of-bounds + maxAccessedIndex >= allocatedUnits +select access, "Array access at or beyond size; have "+allocatedUnits + " units, access at "+ maxAccessedIndex diff --git a/session/session.org b/session/session.org index 0e3be46..8fe3abd 100644 --- a/session/session.org +++ b/session/session.org @@ -163,7 +163,7 @@ skeleton code, not a full source code repository. constant expression. *** Solution - #+BEGIN_SRC java + #+BEGIN_SRC java :tangle example1.ql // Step 1 // void test_const(void) // void test_const_var(void) @@ -209,7 +209,7 @@ To address these, take the query from the previous exercise and *** Solution - #+BEGIN_SRC java + #+BEGIN_SRC java :tangle example2.ql import cpp import semmle.code.cpp.dataflow.DataFlow @@ -269,7 +269,7 @@ To address these, take the query from the previous exercise and *** Solution - #+BEGIN_SRC java + #+BEGIN_SRC java :tangle example3.ql import cpp import semmle.code.cpp.dataflow.DataFlow @@ -320,7 +320,7 @@ To address these, take the query from the previous exercise and *** Solution - #+BEGIN_SRC java + #+BEGIN_SRC java :tangle example4.ql import cpp import semmle.code.cpp.dataflow.DataFlow @@ -413,7 +413,7 @@ XX: : import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis *** Solution - #+BEGIN_SRC java + #+BEGIN_SRC java :tangle example5.ql import cpp import semmle.code.cpp.dataflow.DataFlow import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis @@ -491,7 +491,7 @@ XX: =double=? *** Solution - #+BEGIN_SRC java + #+BEGIN_SRC java :tangle example6.ql import cpp import semmle.code.cpp.dataflow.DataFlow import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis @@ -546,7 +546,7 @@ XX: 3. Compare buffer allocation size to the access index. *** Solution: - #+BEGIN_SRC java + #+BEGIN_SRC java :tangle example7.ql import cpp import semmle.code.cpp.dataflow.DataFlow import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis @@ -612,7 +612,7 @@ XX: to get nicer reporting. *** Solution: - #+BEGIN_SRC java + #+BEGIN_SRC java :tangle example8.ql /** ,* @kind problem ,*/ From 1540e66859eb1ad2b3224055b43d7024990bc379 Mon Sep 17 00:00:00 2001 From: Michael Hohn Date: Thu, 11 May 2023 17:35:23 -0700 Subject: [PATCH 08/28] Add the test cases for the session examples --- session-tests/Example1/example1.expected | 12 ++++ session-tests/Example1/example1.qlref | 1 + session-tests/Example1/test.c | 85 ++++++++++++++++++++++++ session-tests/Example2/example2.expected | 3 + session-tests/Example2/example2.qlref | 1 + session-tests/Example2/test.c | 85 ++++++++++++++++++++++++ session-tests/Example3/example3.expected | 12 ++++ session-tests/Example3/example3.qlref | 1 + session-tests/Example3/test.c | 85 ++++++++++++++++++++++++ session-tests/Example4/example4.expected | 15 +++++ session-tests/Example4/example4.qlref | 1 + session-tests/Example4/test.c | 85 ++++++++++++++++++++++++ session-tests/Example5/example5.expected | 48 +++++++++++++ session-tests/Example5/example5.qlref | 1 + session-tests/Example5/test.c | 85 ++++++++++++++++++++++++ session-tests/Example6/example6.expected | 48 +++++++++++++ session-tests/Example6/example6.qlref | 1 + session-tests/Example6/test.c | 85 ++++++++++++++++++++++++ session-tests/Example7/example7.expected | 48 +++++++++++++ session-tests/Example7/example7.qlref | 1 + session-tests/Example7/test.c | 85 ++++++++++++++++++++++++ session-tests/Example8/example8.expected | 14 ++++ session-tests/Example8/example8.qlref | 1 + session-tests/Example8/test.c | 85 ++++++++++++++++++++++++ session-tests/qlpack.yml | 8 +++ session/example1.ql | 6 +- session/session.org | 44 ++++++------ 27 files changed, 921 insertions(+), 25 deletions(-) create mode 100644 session-tests/Example1/example1.expected create mode 100644 session-tests/Example1/example1.qlref create mode 100644 session-tests/Example1/test.c create mode 100644 session-tests/Example2/example2.expected create mode 100644 session-tests/Example2/example2.qlref create mode 100644 session-tests/Example2/test.c create mode 100644 session-tests/Example3/example3.expected create mode 100644 session-tests/Example3/example3.qlref create mode 100644 session-tests/Example3/test.c create mode 100644 session-tests/Example4/example4.expected create mode 100644 session-tests/Example4/example4.qlref create mode 100644 session-tests/Example4/test.c create mode 100644 session-tests/Example5/example5.expected create mode 100644 session-tests/Example5/example5.qlref create mode 100644 session-tests/Example5/test.c create mode 100644 session-tests/Example6/example6.expected create mode 100644 session-tests/Example6/example6.qlref create mode 100644 session-tests/Example6/test.c create mode 100644 session-tests/Example7/example7.expected create mode 100644 session-tests/Example7/example7.qlref create mode 100644 session-tests/Example7/test.c create mode 100644 session-tests/Example8/example8.expected create mode 100644 session-tests/Example8/example8.qlref create mode 100644 session-tests/Example8/test.c create mode 100644 session-tests/qlpack.yml diff --git a/session-tests/Example1/example1.expected b/session-tests/Example1/example1.expected new file mode 100644 index 0000000..99dad7a --- /dev/null +++ b/session-tests/Example1/example1.expected @@ -0,0 +1,12 @@ +| test.c:7:17:7:22 | call to malloc | test.c:8:5:8:10 | access to array | 0 | test.c:8:9:8:9 | 0 | 100 | test.c:7:24:7:26 | 100 | +| test.c:7:17:7:22 | call to malloc | test.c:9:5:9:11 | access to array | 99 | test.c:9:9:9:10 | 99 | 100 | test.c:7:24:7:26 | 100 | +| test.c:7:17:7:22 | call to malloc | test.c:10:5:10:12 | access to array | 100 | test.c:10:9:10:11 | 100 | 100 | test.c:7:24:7:26 | 100 | +| test.c:7:17:7:22 | call to malloc | test.c:17:5:17:10 | access to array | 0 | test.c:17:9:17:9 | 0 | 100 | test.c:7:24:7:26 | 100 | +| test.c:7:17:7:22 | call to malloc | test.c:18:5:18:11 | access to array | 99 | test.c:18:9:18:10 | 99 | 100 | test.c:7:24:7:26 | 100 | +| test.c:7:17:7:22 | call to malloc | test.c:20:5:20:12 | access to array | 100 | test.c:20:9:20:11 | 100 | 100 | test.c:7:24:7:26 | 100 | +| test.c:7:17:7:22 | call to malloc | test.c:35:5:35:10 | access to array | 0 | test.c:35:9:35:9 | 0 | 100 | test.c:7:24:7:26 | 100 | +| test.c:7:17:7:22 | call to malloc | test.c:36:5:36:11 | access to array | 99 | test.c:36:9:36:10 | 99 | 100 | test.c:7:24:7:26 | 100 | +| test.c:7:17:7:22 | call to malloc | test.c:38:5:38:12 | access to array | 100 | test.c:38:9:38:11 | 100 | 100 | test.c:7:24:7:26 | 100 | +| test.c:7:17:7:22 | call to malloc | test.c:65:5:65:10 | access to array | 0 | test.c:65:9:65:9 | 0 | 100 | test.c:7:24:7:26 | 100 | +| test.c:7:17:7:22 | call to malloc | test.c:66:5:66:12 | access to array | 100 | test.c:66:9:66:11 | 100 | 100 | test.c:7:24:7:26 | 100 | +| test.c:7:17:7:22 | call to malloc | test.c:67:5:67:12 | access to array | 200 | test.c:67:9:67:11 | 200 | 100 | test.c:7:24:7:26 | 100 | diff --git a/session-tests/Example1/example1.qlref b/session-tests/Example1/example1.qlref new file mode 100644 index 0000000..71b7202 --- /dev/null +++ b/session-tests/Example1/example1.qlref @@ -0,0 +1 @@ +example1.ql diff --git a/session-tests/Example1/test.c b/session-tests/Example1/test.c new file mode 100644 index 0000000..6eaa405 --- /dev/null +++ b/session-tests/Example1/test.c @@ -0,0 +1,85 @@ +void *malloc(unsigned long);/* clang compatible */ + +unsigned long extern_get_size(void); + +void test_const(void) +{ + char *buf = malloc(100); + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[100]; // NON_COMPLIANT +} + +void test_const_var(void) +{ + unsigned long size = 100; + char *buf = malloc(size); + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[size - 1]; // COMPLIANT + buf[100]; // NON_COMPLIANT + buf[size]; // NON_COMPLIANT +} + +void test_const_branch(int mode, int random_condition) +{ + unsigned long size = (mode == 1 ? 100 : 200); + + char *buf = malloc(size); + + if (random_condition) + { + size = 300; + } + + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[size - 1]; // NON_COMPLIANT + buf[100]; // NON_COMPLIANT[DONT REPORT] + buf[size]; // NON_COMPLIANT + + if (size < 199) + { + buf[size]; // COMPLIANT + buf[size + 1]; // COMPLIANT + buf[size + 2]; // NON_COMPLIANT + } +} + +void test_const_branch2(int mode) +{ + unsigned long alloc_size = 0; + + if (mode == 1) + { + alloc_size = 200; + } + else + { + // unknown const size - don't report accesses + alloc_size = extern_get_size(); + } + + char *buf = malloc(alloc_size); + + buf[0]; // COMPLIANT + buf[100]; // COMPLIANT + buf[200]; // NON_COMPLIANT + buf[alloc_size - 1]; // COMPLIANT + buf[alloc_size]; // NON_COMPLIANT + + if (alloc_size < 199) + { + buf[alloc_size]; // COMPLIANT + buf[alloc_size + 1]; // COMPLIANT + buf[alloc_size + 2]; // NON_COMPLIANT + } +} + +void test_gvn_var(unsigned long x, unsigned long y, unsigned long sz) +{ + char *buf = malloc(sz * x * y); + buf[sz * x * y - 1]; // COMPLIANT + buf[sz * x * y]; // NON_COMPLIANT + buf[sz * x * y + 1]; // NON_COMPLIANT +} diff --git a/session-tests/Example2/example2.expected b/session-tests/Example2/example2.expected new file mode 100644 index 0000000..d6650b2 --- /dev/null +++ b/session-tests/Example2/example2.expected @@ -0,0 +1,3 @@ +| test.c:7:17:7:22 | call to malloc | test.c:8:5:8:10 | access to array | 0 | test.c:8:9:8:9 | 0 | 100 | test.c:7:24:7:26 | 100 | +| test.c:7:17:7:22 | call to malloc | test.c:9:5:9:11 | access to array | 99 | test.c:9:9:9:10 | 99 | 100 | test.c:7:24:7:26 | 100 | +| test.c:7:17:7:22 | call to malloc | test.c:10:5:10:12 | access to array | 100 | test.c:10:9:10:11 | 100 | 100 | test.c:7:24:7:26 | 100 | diff --git a/session-tests/Example2/example2.qlref b/session-tests/Example2/example2.qlref new file mode 100644 index 0000000..ab91f7f --- /dev/null +++ b/session-tests/Example2/example2.qlref @@ -0,0 +1 @@ +example2.ql diff --git a/session-tests/Example2/test.c b/session-tests/Example2/test.c new file mode 100644 index 0000000..6eaa405 --- /dev/null +++ b/session-tests/Example2/test.c @@ -0,0 +1,85 @@ +void *malloc(unsigned long);/* clang compatible */ + +unsigned long extern_get_size(void); + +void test_const(void) +{ + char *buf = malloc(100); + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[100]; // NON_COMPLIANT +} + +void test_const_var(void) +{ + unsigned long size = 100; + char *buf = malloc(size); + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[size - 1]; // COMPLIANT + buf[100]; // NON_COMPLIANT + buf[size]; // NON_COMPLIANT +} + +void test_const_branch(int mode, int random_condition) +{ + unsigned long size = (mode == 1 ? 100 : 200); + + char *buf = malloc(size); + + if (random_condition) + { + size = 300; + } + + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[size - 1]; // NON_COMPLIANT + buf[100]; // NON_COMPLIANT[DONT REPORT] + buf[size]; // NON_COMPLIANT + + if (size < 199) + { + buf[size]; // COMPLIANT + buf[size + 1]; // COMPLIANT + buf[size + 2]; // NON_COMPLIANT + } +} + +void test_const_branch2(int mode) +{ + unsigned long alloc_size = 0; + + if (mode == 1) + { + alloc_size = 200; + } + else + { + // unknown const size - don't report accesses + alloc_size = extern_get_size(); + } + + char *buf = malloc(alloc_size); + + buf[0]; // COMPLIANT + buf[100]; // COMPLIANT + buf[200]; // NON_COMPLIANT + buf[alloc_size - 1]; // COMPLIANT + buf[alloc_size]; // NON_COMPLIANT + + if (alloc_size < 199) + { + buf[alloc_size]; // COMPLIANT + buf[alloc_size + 1]; // COMPLIANT + buf[alloc_size + 2]; // NON_COMPLIANT + } +} + +void test_gvn_var(unsigned long x, unsigned long y, unsigned long sz) +{ + char *buf = malloc(sz * x * y); + buf[sz * x * y - 1]; // COMPLIANT + buf[sz * x * y]; // NON_COMPLIANT + buf[sz * x * y + 1]; // NON_COMPLIANT +} diff --git a/session-tests/Example3/example3.expected b/session-tests/Example3/example3.expected new file mode 100644 index 0000000..4679be3 --- /dev/null +++ b/session-tests/Example3/example3.expected @@ -0,0 +1,12 @@ +| test.c:7:17:7:22 | call to malloc | test.c:8:5:8:10 | access to array | 0 | test.c:8:9:8:9 | 0 | +| test.c:7:17:7:22 | call to malloc | test.c:9:5:9:11 | access to array | 99 | test.c:9:9:9:10 | 99 | +| test.c:7:17:7:22 | call to malloc | test.c:10:5:10:12 | access to array | 100 | test.c:10:9:10:11 | 100 | +| test.c:16:17:16:22 | call to malloc | test.c:17:5:17:10 | access to array | 0 | test.c:17:9:17:9 | 0 | +| test.c:16:17:16:22 | call to malloc | test.c:18:5:18:11 | access to array | 99 | test.c:18:9:18:10 | 99 | +| test.c:16:17:16:22 | call to malloc | test.c:20:5:20:12 | access to array | 100 | test.c:20:9:20:11 | 100 | +| test.c:28:17:28:22 | call to malloc | test.c:35:5:35:10 | access to array | 0 | test.c:35:9:35:9 | 0 | +| test.c:28:17:28:22 | call to malloc | test.c:36:5:36:11 | access to array | 99 | test.c:36:9:36:10 | 99 | +| test.c:28:17:28:22 | call to malloc | test.c:38:5:38:12 | access to array | 100 | test.c:38:9:38:11 | 100 | +| test.c:63:17:63:22 | call to malloc | test.c:65:5:65:10 | access to array | 0 | test.c:65:9:65:9 | 0 | +| test.c:63:17:63:22 | call to malloc | test.c:66:5:66:12 | access to array | 100 | test.c:66:9:66:11 | 100 | +| test.c:63:17:63:22 | call to malloc | test.c:67:5:67:12 | access to array | 200 | test.c:67:9:67:11 | 200 | diff --git a/session-tests/Example3/example3.qlref b/session-tests/Example3/example3.qlref new file mode 100644 index 0000000..949ef16 --- /dev/null +++ b/session-tests/Example3/example3.qlref @@ -0,0 +1 @@ +example3.ql diff --git a/session-tests/Example3/test.c b/session-tests/Example3/test.c new file mode 100644 index 0000000..6eaa405 --- /dev/null +++ b/session-tests/Example3/test.c @@ -0,0 +1,85 @@ +void *malloc(unsigned long);/* clang compatible */ + +unsigned long extern_get_size(void); + +void test_const(void) +{ + char *buf = malloc(100); + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[100]; // NON_COMPLIANT +} + +void test_const_var(void) +{ + unsigned long size = 100; + char *buf = malloc(size); + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[size - 1]; // COMPLIANT + buf[100]; // NON_COMPLIANT + buf[size]; // NON_COMPLIANT +} + +void test_const_branch(int mode, int random_condition) +{ + unsigned long size = (mode == 1 ? 100 : 200); + + char *buf = malloc(size); + + if (random_condition) + { + size = 300; + } + + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[size - 1]; // NON_COMPLIANT + buf[100]; // NON_COMPLIANT[DONT REPORT] + buf[size]; // NON_COMPLIANT + + if (size < 199) + { + buf[size]; // COMPLIANT + buf[size + 1]; // COMPLIANT + buf[size + 2]; // NON_COMPLIANT + } +} + +void test_const_branch2(int mode) +{ + unsigned long alloc_size = 0; + + if (mode == 1) + { + alloc_size = 200; + } + else + { + // unknown const size - don't report accesses + alloc_size = extern_get_size(); + } + + char *buf = malloc(alloc_size); + + buf[0]; // COMPLIANT + buf[100]; // COMPLIANT + buf[200]; // NON_COMPLIANT + buf[alloc_size - 1]; // COMPLIANT + buf[alloc_size]; // NON_COMPLIANT + + if (alloc_size < 199) + { + buf[alloc_size]; // COMPLIANT + buf[alloc_size + 1]; // COMPLIANT + buf[alloc_size + 2]; // NON_COMPLIANT + } +} + +void test_gvn_var(unsigned long x, unsigned long y, unsigned long sz) +{ + char *buf = malloc(sz * x * y); + buf[sz * x * y - 1]; // COMPLIANT + buf[sz * x * y]; // NON_COMPLIANT + buf[sz * x * y + 1]; // NON_COMPLIANT +} diff --git a/session-tests/Example4/example4.expected b/session-tests/Example4/example4.expected new file mode 100644 index 0000000..88b6955 --- /dev/null +++ b/session-tests/Example4/example4.expected @@ -0,0 +1,15 @@ +| test.c:7:17:7:22 | call to malloc | test.c:8:5:8:10 | access to array | 0 | test.c:8:9:8:9 | 0 | 100 | test.c:7:24:7:26 | 100 | +| test.c:7:17:7:22 | call to malloc | test.c:9:5:9:11 | access to array | 99 | test.c:9:9:9:10 | 99 | 100 | test.c:7:24:7:26 | 100 | +| test.c:7:17:7:22 | call to malloc | test.c:10:5:10:12 | access to array | 100 | test.c:10:9:10:11 | 100 | 100 | test.c:7:24:7:26 | 100 | +| test.c:16:17:16:22 | call to malloc | test.c:17:5:17:10 | access to array | 0 | test.c:17:9:17:9 | 0 | 100 | test.c:15:26:15:28 | 100 | +| test.c:16:17:16:22 | call to malloc | test.c:18:5:18:11 | access to array | 99 | test.c:18:9:18:10 | 99 | 100 | test.c:15:26:15:28 | 100 | +| test.c:16:17:16:22 | call to malloc | test.c:20:5:20:12 | access to array | 100 | test.c:20:9:20:11 | 100 | 100 | test.c:15:26:15:28 | 100 | +| test.c:28:17:28:22 | call to malloc | test.c:35:5:35:10 | access to array | 0 | test.c:35:9:35:9 | 0 | 100 | test.c:26:39:26:41 | 100 | +| test.c:28:17:28:22 | call to malloc | test.c:35:5:35:10 | access to array | 0 | test.c:35:9:35:9 | 0 | 200 | test.c:26:45:26:47 | 200 | +| test.c:28:17:28:22 | call to malloc | test.c:36:5:36:11 | access to array | 99 | test.c:36:9:36:10 | 99 | 100 | test.c:26:39:26:41 | 100 | +| test.c:28:17:28:22 | call to malloc | test.c:36:5:36:11 | access to array | 99 | test.c:36:9:36:10 | 99 | 200 | test.c:26:45:26:47 | 200 | +| test.c:28:17:28:22 | call to malloc | test.c:38:5:38:12 | access to array | 100 | test.c:38:9:38:11 | 100 | 100 | test.c:26:39:26:41 | 100 | +| test.c:28:17:28:22 | call to malloc | test.c:38:5:38:12 | access to array | 100 | test.c:38:9:38:11 | 100 | 200 | test.c:26:45:26:47 | 200 | +| test.c:63:17:63:22 | call to malloc | test.c:65:5:65:10 | access to array | 0 | test.c:65:9:65:9 | 0 | 200 | test.c:55:22:55:24 | 200 | +| test.c:63:17:63:22 | call to malloc | test.c:66:5:66:12 | access to array | 100 | test.c:66:9:66:11 | 100 | 200 | test.c:55:22:55:24 | 200 | +| test.c:63:17:63:22 | call to malloc | test.c:67:5:67:12 | access to array | 200 | test.c:67:9:67:11 | 200 | 200 | test.c:55:22:55:24 | 200 | diff --git a/session-tests/Example4/example4.qlref b/session-tests/Example4/example4.qlref new file mode 100644 index 0000000..b14578e --- /dev/null +++ b/session-tests/Example4/example4.qlref @@ -0,0 +1 @@ +example4.ql diff --git a/session-tests/Example4/test.c b/session-tests/Example4/test.c new file mode 100644 index 0000000..6eaa405 --- /dev/null +++ b/session-tests/Example4/test.c @@ -0,0 +1,85 @@ +void *malloc(unsigned long);/* clang compatible */ + +unsigned long extern_get_size(void); + +void test_const(void) +{ + char *buf = malloc(100); + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[100]; // NON_COMPLIANT +} + +void test_const_var(void) +{ + unsigned long size = 100; + char *buf = malloc(size); + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[size - 1]; // COMPLIANT + buf[100]; // NON_COMPLIANT + buf[size]; // NON_COMPLIANT +} + +void test_const_branch(int mode, int random_condition) +{ + unsigned long size = (mode == 1 ? 100 : 200); + + char *buf = malloc(size); + + if (random_condition) + { + size = 300; + } + + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[size - 1]; // NON_COMPLIANT + buf[100]; // NON_COMPLIANT[DONT REPORT] + buf[size]; // NON_COMPLIANT + + if (size < 199) + { + buf[size]; // COMPLIANT + buf[size + 1]; // COMPLIANT + buf[size + 2]; // NON_COMPLIANT + } +} + +void test_const_branch2(int mode) +{ + unsigned long alloc_size = 0; + + if (mode == 1) + { + alloc_size = 200; + } + else + { + // unknown const size - don't report accesses + alloc_size = extern_get_size(); + } + + char *buf = malloc(alloc_size); + + buf[0]; // COMPLIANT + buf[100]; // COMPLIANT + buf[200]; // NON_COMPLIANT + buf[alloc_size - 1]; // COMPLIANT + buf[alloc_size]; // NON_COMPLIANT + + if (alloc_size < 199) + { + buf[alloc_size]; // COMPLIANT + buf[alloc_size + 1]; // COMPLIANT + buf[alloc_size + 2]; // NON_COMPLIANT + } +} + +void test_gvn_var(unsigned long x, unsigned long y, unsigned long sz) +{ + char *buf = malloc(sz * x * y); + buf[sz * x * y - 1]; // COMPLIANT + buf[sz * x * y]; // NON_COMPLIANT + buf[sz * x * y + 1]; // NON_COMPLIANT +} diff --git a/session-tests/Example5/example5.expected b/session-tests/Example5/example5.expected new file mode 100644 index 0000000..1b5c7e7 --- /dev/null +++ b/session-tests/Example5/example5.expected @@ -0,0 +1,48 @@ +| test.c:7:17:7:22 | call to malloc | test.c:7:24:7:26 | 100 | test.c:8:5:8:10 | access to array | 0.0 | test.c:8:9:8:9 | 0 | 100 | test.c:7:24:7:26 | 100 | +| test.c:7:17:7:22 | call to malloc | test.c:7:24:7:26 | 100 | test.c:9:5:9:11 | access to array | 99.0 | test.c:9:9:9:10 | 99 | 100 | test.c:7:24:7:26 | 100 | +| test.c:7:17:7:22 | call to malloc | test.c:7:24:7:26 | 100 | test.c:10:5:10:12 | access to array | 100.0 | test.c:10:9:10:11 | 100 | 100 | test.c:7:24:7:26 | 100 | +| test.c:16:17:16:22 | call to malloc | test.c:15:26:15:28 | 100 | test.c:17:5:17:10 | access to array | 0.0 | test.c:17:9:17:9 | 0 | 100 | test.c:15:26:15:28 | 100 | +| test.c:16:17:16:22 | call to malloc | test.c:15:26:15:28 | 100 | test.c:18:5:18:11 | access to array | 99.0 | test.c:18:9:18:10 | 99 | 100 | test.c:15:26:15:28 | 100 | +| test.c:16:17:16:22 | call to malloc | test.c:15:26:15:28 | 100 | test.c:19:5:19:17 | access to array | 99.0 | test.c:19:9:19:16 | ... - ... | 100 | test.c:15:26:15:28 | 100 | +| test.c:16:17:16:22 | call to malloc | test.c:15:26:15:28 | 100 | test.c:20:5:20:12 | access to array | 100.0 | test.c:20:9:20:11 | 100 | 100 | test.c:15:26:15:28 | 100 | +| test.c:16:17:16:22 | call to malloc | test.c:15:26:15:28 | 100 | test.c:21:5:21:13 | access to array | 100.0 | test.c:21:9:21:12 | size | 100 | test.c:15:26:15:28 | 100 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:35:5:35:10 | access to array | 0.0 | test.c:35:9:35:9 | 0 | 100 | test.c:26:39:26:41 | 100 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:35:5:35:10 | access to array | 0.0 | test.c:35:9:35:9 | 0 | 200 | test.c:26:45:26:47 | 200 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:36:5:36:11 | access to array | 99.0 | test.c:36:9:36:10 | 99 | 100 | test.c:26:39:26:41 | 100 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:36:5:36:11 | access to array | 99.0 | test.c:36:9:36:10 | 99 | 200 | test.c:26:45:26:47 | 200 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:37:5:37:17 | access to array | 299.0 | test.c:37:9:37:16 | ... - ... | 100 | test.c:26:39:26:41 | 100 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:37:5:37:17 | access to array | 299.0 | test.c:37:9:37:16 | ... - ... | 200 | test.c:26:45:26:47 | 200 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:38:5:38:12 | access to array | 100.0 | test.c:38:9:38:11 | 100 | 100 | test.c:26:39:26:41 | 100 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:38:5:38:12 | access to array | 100.0 | test.c:38:9:38:11 | 100 | 200 | test.c:26:45:26:47 | 200 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:39:5:39:13 | access to array | 300.0 | test.c:39:9:39:12 | size | 100 | test.c:26:39:26:41 | 100 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:39:5:39:13 | access to array | 300.0 | test.c:39:9:39:12 | size | 200 | test.c:26:45:26:47 | 200 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:43:9:43:17 | access to array | 198.0 | test.c:43:13:43:16 | size | 100 | test.c:26:39:26:41 | 100 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:43:9:43:17 | access to array | 198.0 | test.c:43:13:43:16 | size | 200 | test.c:26:45:26:47 | 200 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:44:9:44:21 | access to array | 199.0 | test.c:44:13:44:20 | ... + ... | 100 | test.c:26:39:26:41 | 100 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:44:9:44:21 | access to array | 199.0 | test.c:44:13:44:20 | ... + ... | 200 | test.c:26:45:26:47 | 200 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:45:9:45:21 | access to array | 200.0 | test.c:45:13:45:20 | ... + ... | 100 | test.c:26:39:26:41 | 100 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:45:9:45:21 | access to array | 200.0 | test.c:45:13:45:20 | ... + ... | 200 | test.c:26:45:26:47 | 200 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:35:5:35:10 | access to array | 0.0 | test.c:35:9:35:9 | 0 | 100 | test.c:26:39:26:41 | 100 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:35:5:35:10 | access to array | 0.0 | test.c:35:9:35:9 | 0 | 200 | test.c:26:45:26:47 | 200 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:36:5:36:11 | access to array | 99.0 | test.c:36:9:36:10 | 99 | 100 | test.c:26:39:26:41 | 100 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:36:5:36:11 | access to array | 99.0 | test.c:36:9:36:10 | 99 | 200 | test.c:26:45:26:47 | 200 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:37:5:37:17 | access to array | 299.0 | test.c:37:9:37:16 | ... - ... | 100 | test.c:26:39:26:41 | 100 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:37:5:37:17 | access to array | 299.0 | test.c:37:9:37:16 | ... - ... | 200 | test.c:26:45:26:47 | 200 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:38:5:38:12 | access to array | 100.0 | test.c:38:9:38:11 | 100 | 100 | test.c:26:39:26:41 | 100 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:38:5:38:12 | access to array | 100.0 | test.c:38:9:38:11 | 100 | 200 | test.c:26:45:26:47 | 200 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:39:5:39:13 | access to array | 300.0 | test.c:39:9:39:12 | size | 100 | test.c:26:39:26:41 | 100 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:39:5:39:13 | access to array | 300.0 | test.c:39:9:39:12 | size | 200 | test.c:26:45:26:47 | 200 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:43:9:43:17 | access to array | 198.0 | test.c:43:13:43:16 | size | 100 | test.c:26:39:26:41 | 100 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:43:9:43:17 | access to array | 198.0 | test.c:43:13:43:16 | size | 200 | test.c:26:45:26:47 | 200 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:44:9:44:21 | access to array | 199.0 | test.c:44:13:44:20 | ... + ... | 100 | test.c:26:39:26:41 | 100 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:44:9:44:21 | access to array | 199.0 | test.c:44:13:44:20 | ... + ... | 200 | test.c:26:45:26:47 | 200 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:45:9:45:21 | access to array | 200.0 | test.c:45:13:45:20 | ... + ... | 100 | test.c:26:39:26:41 | 100 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:45:9:45:21 | access to array | 200.0 | test.c:45:13:45:20 | ... + ... | 200 | test.c:26:45:26:47 | 200 | +| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:65:5:65:10 | access to array | 0.0 | test.c:65:9:65:9 | 0 | 200 | test.c:55:22:55:24 | 200 | +| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:66:5:66:12 | access to array | 100.0 | test.c:66:9:66:11 | 100 | 200 | test.c:55:22:55:24 | 200 | +| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:67:5:67:12 | access to array | 200.0 | test.c:67:9:67:11 | 200 | 200 | test.c:55:22:55:24 | 200 | +| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:68:5:68:23 | access to array | 1.8446744073709552E19 | test.c:68:9:68:22 | ... - ... | 200 | test.c:55:22:55:24 | 200 | +| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:69:5:69:19 | access to array | 1.8446744073709552E19 | test.c:69:9:69:18 | alloc_size | 200 | test.c:55:22:55:24 | 200 | +| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:73:9:73:23 | access to array | 198.0 | test.c:73:13:73:22 | alloc_size | 200 | test.c:55:22:55:24 | 200 | +| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:74:9:74:27 | access to array | 199.0 | test.c:74:13:74:26 | ... + ... | 200 | test.c:55:22:55:24 | 200 | +| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:75:9:75:27 | access to array | 200.0 | test.c:75:13:75:26 | ... + ... | 200 | test.c:55:22:55:24 | 200 | diff --git a/session-tests/Example5/example5.qlref b/session-tests/Example5/example5.qlref new file mode 100644 index 0000000..5419e8b --- /dev/null +++ b/session-tests/Example5/example5.qlref @@ -0,0 +1 @@ +example5.ql diff --git a/session-tests/Example5/test.c b/session-tests/Example5/test.c new file mode 100644 index 0000000..6eaa405 --- /dev/null +++ b/session-tests/Example5/test.c @@ -0,0 +1,85 @@ +void *malloc(unsigned long);/* clang compatible */ + +unsigned long extern_get_size(void); + +void test_const(void) +{ + char *buf = malloc(100); + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[100]; // NON_COMPLIANT +} + +void test_const_var(void) +{ + unsigned long size = 100; + char *buf = malloc(size); + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[size - 1]; // COMPLIANT + buf[100]; // NON_COMPLIANT + buf[size]; // NON_COMPLIANT +} + +void test_const_branch(int mode, int random_condition) +{ + unsigned long size = (mode == 1 ? 100 : 200); + + char *buf = malloc(size); + + if (random_condition) + { + size = 300; + } + + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[size - 1]; // NON_COMPLIANT + buf[100]; // NON_COMPLIANT[DONT REPORT] + buf[size]; // NON_COMPLIANT + + if (size < 199) + { + buf[size]; // COMPLIANT + buf[size + 1]; // COMPLIANT + buf[size + 2]; // NON_COMPLIANT + } +} + +void test_const_branch2(int mode) +{ + unsigned long alloc_size = 0; + + if (mode == 1) + { + alloc_size = 200; + } + else + { + // unknown const size - don't report accesses + alloc_size = extern_get_size(); + } + + char *buf = malloc(alloc_size); + + buf[0]; // COMPLIANT + buf[100]; // COMPLIANT + buf[200]; // NON_COMPLIANT + buf[alloc_size - 1]; // COMPLIANT + buf[alloc_size]; // NON_COMPLIANT + + if (alloc_size < 199) + { + buf[alloc_size]; // COMPLIANT + buf[alloc_size + 1]; // COMPLIANT + buf[alloc_size + 2]; // NON_COMPLIANT + } +} + +void test_gvn_var(unsigned long x, unsigned long y, unsigned long sz) +{ + char *buf = malloc(sz * x * y); + buf[sz * x * y - 1]; // COMPLIANT + buf[sz * x * y]; // NON_COMPLIANT + buf[sz * x * y + 1]; // NON_COMPLIANT +} diff --git a/session-tests/Example6/example6.expected b/session-tests/Example6/example6.expected new file mode 100644 index 0000000..f674929 --- /dev/null +++ b/session-tests/Example6/example6.expected @@ -0,0 +1,48 @@ +| test.c:7:17:7:22 | call to malloc | test.c:7:24:7:26 | 100 | test.c:8:5:8:10 | access to array | 0.0 | test.c:8:9:8:9 | 0 | 100 | test.c:7:24:7:26 | 100 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:7:17:7:22 | call to malloc | test.c:7:24:7:26 | 100 | test.c:9:5:9:11 | access to array | 99.0 | test.c:9:9:9:10 | 99 | 100 | test.c:7:24:7:26 | 100 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:7:17:7:22 | call to malloc | test.c:7:24:7:26 | 100 | test.c:10:5:10:12 | access to array | 100.0 | test.c:10:9:10:11 | 100 | 100 | test.c:7:24:7:26 | 100 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:16:17:16:22 | call to malloc | test.c:15:26:15:28 | 100 | test.c:17:5:17:10 | access to array | 0.0 | test.c:17:9:17:9 | 0 | 100 | test.c:15:26:15:28 | 100 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:16:17:16:22 | call to malloc | test.c:15:26:15:28 | 100 | test.c:18:5:18:11 | access to array | 99.0 | test.c:18:9:18:10 | 99 | 100 | test.c:15:26:15:28 | 100 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:16:17:16:22 | call to malloc | test.c:15:26:15:28 | 100 | test.c:19:5:19:17 | access to array | 99.0 | test.c:19:9:19:16 | ... - ... | 100 | test.c:15:26:15:28 | 100 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:16:17:16:22 | call to malloc | test.c:15:26:15:28 | 100 | test.c:20:5:20:12 | access to array | 100.0 | test.c:20:9:20:11 | 100 | 100 | test.c:15:26:15:28 | 100 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:16:17:16:22 | call to malloc | test.c:15:26:15:28 | 100 | test.c:21:5:21:13 | access to array | 100.0 | test.c:21:9:21:12 | size | 100 | test.c:15:26:15:28 | 100 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:35:5:35:10 | access to array | 0.0 | test.c:35:9:35:9 | 0 | 100 | test.c:26:39:26:41 | 100 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:35:5:35:10 | access to array | 0.0 | test.c:35:9:35:9 | 0 | 200 | test.c:26:45:26:47 | 200 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:36:5:36:11 | access to array | 99.0 | test.c:36:9:36:10 | 99 | 100 | test.c:26:39:26:41 | 100 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:36:5:36:11 | access to array | 99.0 | test.c:36:9:36:10 | 99 | 200 | test.c:26:45:26:47 | 200 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:37:5:37:17 | access to array | 299.0 | test.c:37:9:37:16 | ... - ... | 100 | test.c:26:39:26:41 | 100 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:37:5:37:17 | access to array | 299.0 | test.c:37:9:37:16 | ... - ... | 200 | test.c:26:45:26:47 | 200 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:38:5:38:12 | access to array | 100.0 | test.c:38:9:38:11 | 100 | 100 | test.c:26:39:26:41 | 100 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:38:5:38:12 | access to array | 100.0 | test.c:38:9:38:11 | 100 | 200 | test.c:26:45:26:47 | 200 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:39:5:39:13 | access to array | 300.0 | test.c:39:9:39:12 | size | 100 | test.c:26:39:26:41 | 100 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:39:5:39:13 | access to array | 300.0 | test.c:39:9:39:12 | size | 200 | test.c:26:45:26:47 | 200 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:43:9:43:17 | access to array | 198.0 | test.c:43:13:43:16 | size | 100 | test.c:26:39:26:41 | 100 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:43:9:43:17 | access to array | 198.0 | test.c:43:13:43:16 | size | 200 | test.c:26:45:26:47 | 200 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:44:9:44:21 | access to array | 199.0 | test.c:44:13:44:20 | ... + ... | 100 | test.c:26:39:26:41 | 100 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:44:9:44:21 | access to array | 199.0 | test.c:44:13:44:20 | ... + ... | 200 | test.c:26:45:26:47 | 200 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:45:9:45:21 | access to array | 200.0 | test.c:45:13:45:20 | ... + ... | 100 | test.c:26:39:26:41 | 100 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:45:9:45:21 | access to array | 200.0 | test.c:45:13:45:20 | ... + ... | 200 | test.c:26:45:26:47 | 200 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:35:5:35:10 | access to array | 0.0 | test.c:35:9:35:9 | 0 | 100 | test.c:26:39:26:41 | 100 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:35:5:35:10 | access to array | 0.0 | test.c:35:9:35:9 | 0 | 200 | test.c:26:45:26:47 | 200 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:36:5:36:11 | access to array | 99.0 | test.c:36:9:36:10 | 99 | 100 | test.c:26:39:26:41 | 100 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:36:5:36:11 | access to array | 99.0 | test.c:36:9:36:10 | 99 | 200 | test.c:26:45:26:47 | 200 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:37:5:37:17 | access to array | 299.0 | test.c:37:9:37:16 | ... - ... | 100 | test.c:26:39:26:41 | 100 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:37:5:37:17 | access to array | 299.0 | test.c:37:9:37:16 | ... - ... | 200 | test.c:26:45:26:47 | 200 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:38:5:38:12 | access to array | 100.0 | test.c:38:9:38:11 | 100 | 100 | test.c:26:39:26:41 | 100 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:38:5:38:12 | access to array | 100.0 | test.c:38:9:38:11 | 100 | 200 | test.c:26:45:26:47 | 200 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:39:5:39:13 | access to array | 300.0 | test.c:39:9:39:12 | size | 100 | test.c:26:39:26:41 | 100 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:39:5:39:13 | access to array | 300.0 | test.c:39:9:39:12 | size | 200 | test.c:26:45:26:47 | 200 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:43:9:43:17 | access to array | 198.0 | test.c:43:13:43:16 | size | 100 | test.c:26:39:26:41 | 100 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:43:9:43:17 | access to array | 198.0 | test.c:43:13:43:16 | size | 200 | test.c:26:45:26:47 | 200 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:44:9:44:21 | access to array | 199.0 | test.c:44:13:44:20 | ... + ... | 100 | test.c:26:39:26:41 | 100 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:44:9:44:21 | access to array | 199.0 | test.c:44:13:44:20 | ... + ... | 200 | test.c:26:45:26:47 | 200 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:45:9:45:21 | access to array | 200.0 | test.c:45:13:45:20 | ... + ... | 100 | test.c:26:39:26:41 | 100 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:45:9:45:21 | access to array | 200.0 | test.c:45:13:45:20 | ... + ... | 200 | test.c:26:45:26:47 | 200 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:65:5:65:10 | access to array | 0.0 | test.c:65:9:65:9 | 0 | 200 | test.c:55:22:55:24 | 200 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:66:5:66:12 | access to array | 100.0 | test.c:66:9:66:11 | 100 | 200 | test.c:55:22:55:24 | 200 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:67:5:67:12 | access to array | 200.0 | test.c:67:9:67:11 | 200 | 200 | test.c:55:22:55:24 | 200 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:68:5:68:23 | access to array | 1.8446744073709552E19 | test.c:68:9:68:22 | ... - ... | 200 | test.c:55:22:55:24 | 200 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:69:5:69:19 | access to array | 1.8446744073709552E19 | test.c:69:9:69:18 | alloc_size | 200 | test.c:55:22:55:24 | 200 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:73:9:73:23 | access to array | 198.0 | test.c:73:13:73:22 | alloc_size | 200 | test.c:55:22:55:24 | 200 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:74:9:74:27 | access to array | 199.0 | test.c:74:13:74:26 | ... + ... | 200 | test.c:55:22:55:24 | 200 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:75:9:75:27 | access to array | 200.0 | test.c:75:13:75:26 | ... + ... | 200 | test.c:55:22:55:24 | 200 | file://:0:0:0:0 | char | 1 | 1 | diff --git a/session-tests/Example6/example6.qlref b/session-tests/Example6/example6.qlref new file mode 100644 index 0000000..2c38a28 --- /dev/null +++ b/session-tests/Example6/example6.qlref @@ -0,0 +1 @@ +example6.ql diff --git a/session-tests/Example6/test.c b/session-tests/Example6/test.c new file mode 100644 index 0000000..6eaa405 --- /dev/null +++ b/session-tests/Example6/test.c @@ -0,0 +1,85 @@ +void *malloc(unsigned long);/* clang compatible */ + +unsigned long extern_get_size(void); + +void test_const(void) +{ + char *buf = malloc(100); + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[100]; // NON_COMPLIANT +} + +void test_const_var(void) +{ + unsigned long size = 100; + char *buf = malloc(size); + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[size - 1]; // COMPLIANT + buf[100]; // NON_COMPLIANT + buf[size]; // NON_COMPLIANT +} + +void test_const_branch(int mode, int random_condition) +{ + unsigned long size = (mode == 1 ? 100 : 200); + + char *buf = malloc(size); + + if (random_condition) + { + size = 300; + } + + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[size - 1]; // NON_COMPLIANT + buf[100]; // NON_COMPLIANT[DONT REPORT] + buf[size]; // NON_COMPLIANT + + if (size < 199) + { + buf[size]; // COMPLIANT + buf[size + 1]; // COMPLIANT + buf[size + 2]; // NON_COMPLIANT + } +} + +void test_const_branch2(int mode) +{ + unsigned long alloc_size = 0; + + if (mode == 1) + { + alloc_size = 200; + } + else + { + // unknown const size - don't report accesses + alloc_size = extern_get_size(); + } + + char *buf = malloc(alloc_size); + + buf[0]; // COMPLIANT + buf[100]; // COMPLIANT + buf[200]; // NON_COMPLIANT + buf[alloc_size - 1]; // COMPLIANT + buf[alloc_size]; // NON_COMPLIANT + + if (alloc_size < 199) + { + buf[alloc_size]; // COMPLIANT + buf[alloc_size + 1]; // COMPLIANT + buf[alloc_size + 2]; // NON_COMPLIANT + } +} + +void test_gvn_var(unsigned long x, unsigned long y, unsigned long sz) +{ + char *buf = malloc(sz * x * y); + buf[sz * x * y - 1]; // COMPLIANT + buf[sz * x * y]; // NON_COMPLIANT + buf[sz * x * y + 1]; // NON_COMPLIANT +} diff --git a/session-tests/Example7/example7.expected b/session-tests/Example7/example7.expected new file mode 100644 index 0000000..269d633 --- /dev/null +++ b/session-tests/Example7/example7.expected @@ -0,0 +1,48 @@ +| test.c:7:17:7:22 | call to malloc | test.c:7:24:7:26 | 100 | test.c:8:5:8:10 | access to array | 0.0 | test.c:7:24:7:26 | 100 | 100 | 0.0 | +| test.c:7:17:7:22 | call to malloc | test.c:7:24:7:26 | 100 | test.c:9:5:9:11 | access to array | 99.0 | test.c:7:24:7:26 | 100 | 100 | 99.0 | +| test.c:7:17:7:22 | call to malloc | test.c:7:24:7:26 | 100 | test.c:10:5:10:12 | access to array | 100.0 | test.c:7:24:7:26 | 100 | 100 | 100.0 | +| test.c:16:17:16:22 | call to malloc | test.c:15:26:15:28 | 100 | test.c:17:5:17:10 | access to array | 0.0 | test.c:15:26:15:28 | 100 | 100 | 0.0 | +| test.c:16:17:16:22 | call to malloc | test.c:15:26:15:28 | 100 | test.c:18:5:18:11 | access to array | 99.0 | test.c:15:26:15:28 | 100 | 100 | 99.0 | +| test.c:16:17:16:22 | call to malloc | test.c:15:26:15:28 | 100 | test.c:19:5:19:17 | access to array | 99.0 | test.c:15:26:15:28 | 100 | 100 | 99.0 | +| test.c:16:17:16:22 | call to malloc | test.c:15:26:15:28 | 100 | test.c:20:5:20:12 | access to array | 100.0 | test.c:15:26:15:28 | 100 | 100 | 100.0 | +| test.c:16:17:16:22 | call to malloc | test.c:15:26:15:28 | 100 | test.c:21:5:21:13 | access to array | 100.0 | test.c:15:26:15:28 | 100 | 100 | 100.0 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:35:5:35:10 | access to array | 0.0 | test.c:26:39:26:41 | 100 | 100 | 0.0 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:35:5:35:10 | access to array | 0.0 | test.c:26:45:26:47 | 200 | 200 | 0.0 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:36:5:36:11 | access to array | 99.0 | test.c:26:39:26:41 | 100 | 100 | 99.0 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:36:5:36:11 | access to array | 99.0 | test.c:26:45:26:47 | 200 | 200 | 99.0 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:37:5:37:17 | access to array | 299.0 | test.c:26:39:26:41 | 100 | 100 | 299.0 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:37:5:37:17 | access to array | 299.0 | test.c:26:45:26:47 | 200 | 200 | 299.0 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:38:5:38:12 | access to array | 100.0 | test.c:26:39:26:41 | 100 | 100 | 100.0 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:38:5:38:12 | access to array | 100.0 | test.c:26:45:26:47 | 200 | 200 | 100.0 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:39:5:39:13 | access to array | 300.0 | test.c:26:39:26:41 | 100 | 100 | 300.0 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:39:5:39:13 | access to array | 300.0 | test.c:26:45:26:47 | 200 | 200 | 300.0 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:43:9:43:17 | access to array | 198.0 | test.c:26:39:26:41 | 100 | 100 | 198.0 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:43:9:43:17 | access to array | 198.0 | test.c:26:45:26:47 | 200 | 200 | 198.0 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:44:9:44:21 | access to array | 199.0 | test.c:26:39:26:41 | 100 | 100 | 199.0 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:44:9:44:21 | access to array | 199.0 | test.c:26:45:26:47 | 200 | 200 | 199.0 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:45:9:45:21 | access to array | 200.0 | test.c:26:39:26:41 | 100 | 100 | 200.0 | +| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:45:9:45:21 | access to array | 200.0 | test.c:26:45:26:47 | 200 | 200 | 200.0 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:35:5:35:10 | access to array | 0.0 | test.c:26:39:26:41 | 100 | 100 | 0.0 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:35:5:35:10 | access to array | 0.0 | test.c:26:45:26:47 | 200 | 200 | 0.0 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:36:5:36:11 | access to array | 99.0 | test.c:26:39:26:41 | 100 | 100 | 99.0 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:36:5:36:11 | access to array | 99.0 | test.c:26:45:26:47 | 200 | 200 | 99.0 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:37:5:37:17 | access to array | 299.0 | test.c:26:39:26:41 | 100 | 100 | 299.0 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:37:5:37:17 | access to array | 299.0 | test.c:26:45:26:47 | 200 | 200 | 299.0 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:38:5:38:12 | access to array | 100.0 | test.c:26:39:26:41 | 100 | 100 | 100.0 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:38:5:38:12 | access to array | 100.0 | test.c:26:45:26:47 | 200 | 200 | 100.0 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:39:5:39:13 | access to array | 300.0 | test.c:26:39:26:41 | 100 | 100 | 300.0 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:39:5:39:13 | access to array | 300.0 | test.c:26:45:26:47 | 200 | 200 | 300.0 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:43:9:43:17 | access to array | 198.0 | test.c:26:39:26:41 | 100 | 100 | 198.0 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:43:9:43:17 | access to array | 198.0 | test.c:26:45:26:47 | 200 | 200 | 198.0 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:44:9:44:21 | access to array | 199.0 | test.c:26:39:26:41 | 100 | 100 | 199.0 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:44:9:44:21 | access to array | 199.0 | test.c:26:45:26:47 | 200 | 200 | 199.0 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:45:9:45:21 | access to array | 200.0 | test.c:26:39:26:41 | 100 | 100 | 200.0 | +| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:45:9:45:21 | access to array | 200.0 | test.c:26:45:26:47 | 200 | 200 | 200.0 | +| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:65:5:65:10 | access to array | 0.0 | test.c:55:22:55:24 | 200 | 200 | 0.0 | +| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:66:5:66:12 | access to array | 100.0 | test.c:55:22:55:24 | 200 | 200 | 100.0 | +| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:67:5:67:12 | access to array | 200.0 | test.c:55:22:55:24 | 200 | 200 | 200.0 | +| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:68:5:68:23 | access to array | 1.8446744073709552E19 | test.c:55:22:55:24 | 200 | 200 | 1.8446744073709552E19 | +| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:69:5:69:19 | access to array | 1.8446744073709552E19 | test.c:55:22:55:24 | 200 | 200 | 1.8446744073709552E19 | +| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:73:9:73:23 | access to array | 198.0 | test.c:55:22:55:24 | 200 | 200 | 198.0 | +| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:74:9:74:27 | access to array | 199.0 | test.c:55:22:55:24 | 200 | 200 | 199.0 | +| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:75:9:75:27 | access to array | 200.0 | test.c:55:22:55:24 | 200 | 200 | 200.0 | diff --git a/session-tests/Example7/example7.qlref b/session-tests/Example7/example7.qlref new file mode 100644 index 0000000..5b0fec2 --- /dev/null +++ b/session-tests/Example7/example7.qlref @@ -0,0 +1 @@ +example7.ql diff --git a/session-tests/Example7/test.c b/session-tests/Example7/test.c new file mode 100644 index 0000000..6eaa405 --- /dev/null +++ b/session-tests/Example7/test.c @@ -0,0 +1,85 @@ +void *malloc(unsigned long);/* clang compatible */ + +unsigned long extern_get_size(void); + +void test_const(void) +{ + char *buf = malloc(100); + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[100]; // NON_COMPLIANT +} + +void test_const_var(void) +{ + unsigned long size = 100; + char *buf = malloc(size); + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[size - 1]; // COMPLIANT + buf[100]; // NON_COMPLIANT + buf[size]; // NON_COMPLIANT +} + +void test_const_branch(int mode, int random_condition) +{ + unsigned long size = (mode == 1 ? 100 : 200); + + char *buf = malloc(size); + + if (random_condition) + { + size = 300; + } + + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[size - 1]; // NON_COMPLIANT + buf[100]; // NON_COMPLIANT[DONT REPORT] + buf[size]; // NON_COMPLIANT + + if (size < 199) + { + buf[size]; // COMPLIANT + buf[size + 1]; // COMPLIANT + buf[size + 2]; // NON_COMPLIANT + } +} + +void test_const_branch2(int mode) +{ + unsigned long alloc_size = 0; + + if (mode == 1) + { + alloc_size = 200; + } + else + { + // unknown const size - don't report accesses + alloc_size = extern_get_size(); + } + + char *buf = malloc(alloc_size); + + buf[0]; // COMPLIANT + buf[100]; // COMPLIANT + buf[200]; // NON_COMPLIANT + buf[alloc_size - 1]; // COMPLIANT + buf[alloc_size]; // NON_COMPLIANT + + if (alloc_size < 199) + { + buf[alloc_size]; // COMPLIANT + buf[alloc_size + 1]; // COMPLIANT + buf[alloc_size + 2]; // NON_COMPLIANT + } +} + +void test_gvn_var(unsigned long x, unsigned long y, unsigned long sz) +{ + char *buf = malloc(sz * x * y); + buf[sz * x * y - 1]; // COMPLIANT + buf[sz * x * y]; // NON_COMPLIANT + buf[sz * x * y + 1]; // NON_COMPLIANT +} diff --git a/session-tests/Example8/example8.expected b/session-tests/Example8/example8.expected new file mode 100644 index 0000000..abca8e7 --- /dev/null +++ b/session-tests/Example8/example8.expected @@ -0,0 +1,14 @@ +| test.c:10:5:10:12 | access to array | Array access at or beyond size; have 100 units, access at 100 | +| test.c:20:5:20:12 | access to array | Array access at or beyond size; have 100 units, access at 100 | +| test.c:21:5:21:13 | access to array | Array access at or beyond size; have 100 units, access at 100 | +| test.c:37:5:37:17 | access to array | Array access at or beyond size; have 100 units, access at 299 | +| test.c:37:5:37:17 | access to array | Array access at or beyond size; have 200 units, access at 299 | +| test.c:38:5:38:12 | access to array | Array access at or beyond size; have 100 units, access at 100 | +| test.c:39:5:39:13 | access to array | Array access at or beyond size; have 100 units, access at 300 | +| test.c:39:5:39:13 | access to array | Array access at or beyond size; have 200 units, access at 300 | +| test.c:43:9:43:17 | access to array | Array access at or beyond size; have 100 units, access at 198 | +| test.c:44:9:44:21 | access to array | Array access at or beyond size; have 100 units, access at 199 | +| test.c:45:9:45:21 | access to array | Array access at or beyond size; have 100 units, access at 200 | +| test.c:45:9:45:21 | access to array | Array access at or beyond size; have 200 units, access at 200 | +| test.c:67:5:67:12 | access to array | Array access at or beyond size; have 200 units, access at 200 | +| test.c:75:9:75:27 | access to array | Array access at or beyond size; have 200 units, access at 200 | diff --git a/session-tests/Example8/example8.qlref b/session-tests/Example8/example8.qlref new file mode 100644 index 0000000..20f0e3f --- /dev/null +++ b/session-tests/Example8/example8.qlref @@ -0,0 +1 @@ +example8.ql diff --git a/session-tests/Example8/test.c b/session-tests/Example8/test.c new file mode 100644 index 0000000..6eaa405 --- /dev/null +++ b/session-tests/Example8/test.c @@ -0,0 +1,85 @@ +void *malloc(unsigned long);/* clang compatible */ + +unsigned long extern_get_size(void); + +void test_const(void) +{ + char *buf = malloc(100); + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[100]; // NON_COMPLIANT +} + +void test_const_var(void) +{ + unsigned long size = 100; + char *buf = malloc(size); + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[size - 1]; // COMPLIANT + buf[100]; // NON_COMPLIANT + buf[size]; // NON_COMPLIANT +} + +void test_const_branch(int mode, int random_condition) +{ + unsigned long size = (mode == 1 ? 100 : 200); + + char *buf = malloc(size); + + if (random_condition) + { + size = 300; + } + + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[size - 1]; // NON_COMPLIANT + buf[100]; // NON_COMPLIANT[DONT REPORT] + buf[size]; // NON_COMPLIANT + + if (size < 199) + { + buf[size]; // COMPLIANT + buf[size + 1]; // COMPLIANT + buf[size + 2]; // NON_COMPLIANT + } +} + +void test_const_branch2(int mode) +{ + unsigned long alloc_size = 0; + + if (mode == 1) + { + alloc_size = 200; + } + else + { + // unknown const size - don't report accesses + alloc_size = extern_get_size(); + } + + char *buf = malloc(alloc_size); + + buf[0]; // COMPLIANT + buf[100]; // COMPLIANT + buf[200]; // NON_COMPLIANT + buf[alloc_size - 1]; // COMPLIANT + buf[alloc_size]; // NON_COMPLIANT + + if (alloc_size < 199) + { + buf[alloc_size]; // COMPLIANT + buf[alloc_size + 1]; // COMPLIANT + buf[alloc_size + 2]; // NON_COMPLIANT + } +} + +void test_gvn_var(unsigned long x, unsigned long y, unsigned long sz) +{ + char *buf = malloc(sz * x * y); + buf[sz * x * y - 1]; // COMPLIANT + buf[sz * x * y]; // NON_COMPLIANT + buf[sz * x * y + 1]; // NON_COMPLIANT +} diff --git a/session-tests/qlpack.yml b/session-tests/qlpack.yml new file mode 100644 index 0000000..b191d63 --- /dev/null +++ b/session-tests/qlpack.yml @@ -0,0 +1,8 @@ +--- +library: false +name: session-tests +version: 0.0.1 +dependencies: + "session": "*" +extractor: cpp +tests: . \ No newline at end of file diff --git a/session/example1.ql b/session/example1.ql index a0e67e4..4d26746 100644 --- a/session/example1.ql +++ b/session/example1.ql @@ -1,6 +1,6 @@ -// Step 1 -// void test_const(void) -// void test_const_var(void) +import cpp +import semmle.code.cpp.dataflow.DataFlow + from AllocationExpr buffer, ArrayExpr access, int bufferSize, int accessIdx, Expr allocSizeExpr where // malloc (100) diff --git a/session/session.org b/session/session.org index 8fe3abd..1296081 100644 --- a/session/session.org +++ b/session/session.org @@ -164,28 +164,28 @@ skeleton code, not a full source code repository. *** Solution #+BEGIN_SRC java :tangle example1.ql -// Step 1 -// void test_const(void) -// void test_const_var(void) -from AllocationExpr buffer, ArrayExpr access, int bufferSize, int accessIdx, Expr allocSizeExpr -where - // malloc (100) - // ^^^^^^ AllocationExpr buffer - // - // buf[...] - // ^^^ ArrayExpr access - // - // buf[...] - // ^^^ int accessIdx - // - accessIdx = access.getArrayOffset().getValue().toInt() and - // - // malloc (100) - // ^^^ allocSizeExpr / bufferSize - // - allocSizeExpr = buffer.(Call).getArgument(0) and - bufferSize = allocSizeExpr.getValue().toInt() -select buffer, access, accessIdx, access.getArrayOffset(), bufferSize, allocSizeExpr + import cpp + import semmle.code.cpp.dataflow.DataFlow + + from AllocationExpr buffer, ArrayExpr access, int bufferSize, int accessIdx, Expr allocSizeExpr + where + // malloc (100) + // ^^^^^^ AllocationExpr buffer + // + // buf[...] + // ^^^ ArrayExpr access + // + // buf[...] + // ^^^ int accessIdx + // + accessIdx = access.getArrayOffset().getValue().toInt() and + // + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // + allocSizeExpr = buffer.(Call).getArgument(0) and + bufferSize = allocSizeExpr.getValue().toInt() + select buffer, access, accessIdx, access.getArrayOffset(), bufferSize, allocSizeExpr #+END_SRC This produces 12 results, with some cross-function pairs. From ded9d3474ec7621dc58bb8d498ff7f10700dbb17 Mon Sep 17 00:00:00 2001 From: Michael Hohn Date: Mon, 15 May 2023 18:24:24 -0700 Subject: [PATCH 09/28] Include session examples rather than inlining. Clear up some session commentary --- session-tests/Example9/example9.expected | 8 + session-tests/Example9/example9.qlref | 1 + session-tests/Example9/test.c | 85 +++ session/example9.ql | 49 ++ session/session.md | 930 +++++++++++++++++++++++ session/session.org | 441 ++--------- 6 files changed, 1116 insertions(+), 398 deletions(-) create mode 100644 session-tests/Example9/example9.expected create mode 100644 session-tests/Example9/example9.qlref create mode 100644 session-tests/Example9/test.c create mode 100644 session/example9.ql create mode 100644 session/session.md diff --git a/session-tests/Example9/example9.expected b/session-tests/Example9/example9.expected new file mode 100644 index 0000000..2936e3d --- /dev/null +++ b/session-tests/Example9/example9.expected @@ -0,0 +1,8 @@ +| test.c:21:5:21:13 | access to array | test.c:15:26:15:28 | GVN | test.c:15:26:15:28 | GVN | +| test.c:38:5:38:12 | access to array | test.c:26:39:26:41 | GVN | test.c:26:39:26:41 | GVN | +| test.c:69:5:69:19 | access to array | test.c:63:24:63:33 | GVN | test.c:63:24:63:33 | GVN | +| test.c:73:9:73:23 | access to array | test.c:63:24:63:33 | GVN | test.c:63:24:63:33 | GVN | +| test.c:74:9:74:27 | access to array | test.c:74:13:74:26 | GVN | test.c:63:24:63:33 | GVN | +| test.c:75:9:75:27 | access to array | test.c:75:13:75:26 | GVN | test.c:63:24:63:33 | GVN | +| test.c:83:5:83:19 | access to array | test.c:81:24:81:33 | GVN | test.c:81:24:81:33 | GVN | +| test.c:84:5:84:23 | access to array | test.c:84:9:84:22 | GVN | test.c:81:24:81:33 | GVN | diff --git a/session-tests/Example9/example9.qlref b/session-tests/Example9/example9.qlref new file mode 100644 index 0000000..2d30c37 --- /dev/null +++ b/session-tests/Example9/example9.qlref @@ -0,0 +1 @@ +example9.ql diff --git a/session-tests/Example9/test.c b/session-tests/Example9/test.c new file mode 100644 index 0000000..6eaa405 --- /dev/null +++ b/session-tests/Example9/test.c @@ -0,0 +1,85 @@ +void *malloc(unsigned long);/* clang compatible */ + +unsigned long extern_get_size(void); + +void test_const(void) +{ + char *buf = malloc(100); + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[100]; // NON_COMPLIANT +} + +void test_const_var(void) +{ + unsigned long size = 100; + char *buf = malloc(size); + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[size - 1]; // COMPLIANT + buf[100]; // NON_COMPLIANT + buf[size]; // NON_COMPLIANT +} + +void test_const_branch(int mode, int random_condition) +{ + unsigned long size = (mode == 1 ? 100 : 200); + + char *buf = malloc(size); + + if (random_condition) + { + size = 300; + } + + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[size - 1]; // NON_COMPLIANT + buf[100]; // NON_COMPLIANT[DONT REPORT] + buf[size]; // NON_COMPLIANT + + if (size < 199) + { + buf[size]; // COMPLIANT + buf[size + 1]; // COMPLIANT + buf[size + 2]; // NON_COMPLIANT + } +} + +void test_const_branch2(int mode) +{ + unsigned long alloc_size = 0; + + if (mode == 1) + { + alloc_size = 200; + } + else + { + // unknown const size - don't report accesses + alloc_size = extern_get_size(); + } + + char *buf = malloc(alloc_size); + + buf[0]; // COMPLIANT + buf[100]; // COMPLIANT + buf[200]; // NON_COMPLIANT + buf[alloc_size - 1]; // COMPLIANT + buf[alloc_size]; // NON_COMPLIANT + + if (alloc_size < 199) + { + buf[alloc_size]; // COMPLIANT + buf[alloc_size + 1]; // COMPLIANT + buf[alloc_size + 2]; // NON_COMPLIANT + } +} + +void test_gvn_var(unsigned long x, unsigned long y, unsigned long sz) +{ + char *buf = malloc(sz * x * y); + buf[sz * x * y - 1]; // COMPLIANT + buf[sz * x * y]; // NON_COMPLIANT + buf[sz * x * y + 1]; // NON_COMPLIANT +} diff --git a/session/example9.ql b/session/example9.ql new file mode 100644 index 0000000..382ea0b --- /dev/null +++ b/session/example9.ql @@ -0,0 +1,49 @@ +import cpp +import semmle.code.cpp.dataflow.DataFlow +import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis +import semmle.code.cpp.valuenumbering.GlobalValueNumbering + +// Step 9 +from + AllocationExpr buffer, ArrayExpr access, Expr accessIdx, Expr allocSizeExpr, GVN gvnAccess, + GVN gvnAlloc +where + // malloc (100) + // ^^^^^^^^^^^^ AllocationExpr buffer + // + // buf[...] + // ^^^ ArrayExpr access + // buf[...] + // ^^^ accessIdx + accessIdx = access.getArrayOffset() and + // + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // unsigned long size = 100; + // ... + // char *buf = malloc(size); + DataFlow::localExprFlow(allocSizeExpr, buffer.getSizeExpr()) and + // char *buf = ... buf[0]; + // ^^^ ---> ^^^ + // or + // malloc(100); buf[0] + // ^^^ --------> ^^^ + // + DataFlow::localExprFlow(buffer, access.getArrayBase()) and + // + // Use GVN + globalValueNumber(accessIdx) = gvnAccess and + globalValueNumber(allocSizeExpr) = gvnAlloc and + ( + gvnAccess = gvnAlloc + or + // buf[sz * x * y] above + // buf[sz * x * y + 1]; + exists(AddExpr add | + accessIdx = add and + // add.getAnOperand() = accessIdx and + add.getAnOperand().getValue().toInt() > 0 and + globalValueNumber(add.getAnOperand()) = gvnAlloc + ) + ) +select access, gvnAccess, gvnAlloc diff --git a/session/session.md b/session/session.md new file mode 100644 index 0000000..ea6f975 --- /dev/null +++ b/session/session.md @@ -0,0 +1,930 @@ +- [CodeQL Workshop — Using Data-Flow and Range Analysis to Find Out-Of-Bounds Accesses](#codeql-workshop--using-data-flow-and-range-analysis-to-find-out-of-bounds-accesses) +- [Acknowledgments](#acknowledgments) +- [Setup Instructions](#setup-instructions) +- [Introduction](#introduction) +- [A Note on the Scope of This Workshop](#a-note-on-the-scope-of-this-workshop) +- [Session/Workshop notes](#sessionworkshop-notes) + - [Step 1](#exercise-1) + - [Hints](#hints) + - [Solution](#org8ca1443) + - [Step 2](#org6138b3d) + - [Hints](#hints) + - [Solution](#org287ad06) + - [Results](#org4b8509f) + - [Step 3](#exercise-2) + - [Solution](#orga37db88) + - [Results](#org22d1a25) + - [Step 4](#org493babd) + - [Hint](#org57d9881) + - [Solution](#org9303851) + - [Results](#org9ba681e) + - [Step 5 – SimpleRangeAnalysis](#orgda84218) + - [Solution](#orgb5a7df0) + - [Results](#orgf04ac53) + - [Step 6](#orgd9ab97c) + - [Solution](#org79d9ce3) + - [Results](#org00d27a6) + - [Step 7](#org4bfd9c3) + - [Solution:](#orgf500bdf) + - [Results](#org07a41ff) + - [Step 8](#orgd642b5f) + - [Solution:](#org696e813) + - [Results](#org77abe31) + - [Interim notes](#org03ebd84) + - [Step 9 – GlobalValueNumbering](#org29bb594) + - [interim](#orgfc8f904) + - [interim](#org53cf2e1) + - [hashcons](#org7ccef88) + + + + +# CodeQL Workshop — Using Data-Flow and Range Analysis to Find Out-Of-Bounds Accesses + + + + +# Acknowledgments + +This session-based workshop is based on the exercise/unit-test-based material at , which in turn is based on a significantly simplified and modified version of the [OutOfBounds.qll library](https://github.com/github/codeql-coding-standards/blob/main/c/common/src/codingstandards/c/OutOfBounds.qll) from the [CodeQL Coding Standards repository](https://github.com/github/codeql-coding-standards). + + + + +# Setup Instructions + +- Install [Visual Studio Code](https://code.visualstudio.com/). + +- Install the [CodeQL extension for Visual Studio Code](https://codeql.github.com/docs/codeql-for-visual-studio-code/setting-up-codeql-in-visual-studio-code/). + +- Install the latest version of the [CodeQL CLI](https://github.com/github/codeql-cli-binaries/releases). + +- Clone this repository: + + ```sh + git clone https://github.com/hohn/codeql-workshop-runtime-values-c + ``` + +- Install the CodeQL pack dependencies using the command `CodeQL: Install Pack Dependencies` and select `exercises`, `solutions`, `exercises-tests`, `session`, `session-db` and `solutions-tests` from the list of packs. + +- If you have CodeQL on your PATH, build the database using `build-database.sh` and load the database with the VS Code CodeQL extension. It is at `session-db/cpp-runtime-values-db`. + - Alternatively, you can download [this pre-built database](https://drive.google.com/file/d/1N8TYJ6f4E33e6wuyorWHZHVCHBZy8Bhb/view?usp=sharing). + +- If you do **not** have CodeQL on your PATH, build the database using the unit test sytem. Choose the `TESTING` tab in VS Code, run the `session-db/DB/db.qlref` test. The test will fail, but it leaves a usable CodeQL database in `session-db/DB/DB.testproj`. + +- ❗Important❗: Run `initialize-qltests.sh` to initialize the tests. Otherwise, you will not be able to run the QLTests in `exercises-tests`. + + + + +# Introduction + +This workshop focuses on analyzing and relating two values — array access indices and memory allocation sizes — in order to identify simple cases of out-of-bounds array accesses. + +The following snippets demonstrate how an out-of-bounds array access can occur: + +```cpp +char* buffer = malloc(10); +buffer[9] = 'a'; // ok +buffer[10] = 'b'; // out-of-bounds +``` + +A more complex example: + +```cpp +char* buffer; +if(rand() == 1) { + buffer = malloc(10); +} +else { + buffer = malloc(11); +} +size_t index = 0; +if(rand() == 1) { + index = 10; +} +buffer[index]; // potentially out-of-bounds depending on control-flow +``` + +Another common case *not* covered in this introductory workshop involves loops, as follows: + +```cpp +int elements[5]; +for (int i = 0; i <= 5; ++i) { + elements[i] = 0; +} +``` + +To find these issues, we can implement an analysis that tracks the upper or lower bounds on an expression and, combined with data-flow analysis to reduce false-positives, identifies cases where the index of the array results in an access beyond the allocated size of the buffer. + + + + +# A Note on the Scope of This Workshop + +This workshop is not intended to be a complete analysis that is useful for real-world cases of out-of-bounds analyses for reasons including but not limited to: + +- Missing support for loops and recursion +- No interprocedural analysis +- Missing size calculation of arrays where the element size is not 1 +- No support for pointer arithmetic or in general, operations other than addition and subtraction +- Overly specific modelling of a buffer access as an array expression + +The goal of this workshop is rather to demonstrate the building blocks of analyzing run-time values and how to apply those building blocks to modelling a common class of vulnerability. A more comprehensive and production-appropriate example is the [OutOfBounds.qll library](https://github.com/github/codeql-coding-standards/blob/main/c/common/src/codingstandards/c/OutOfBounds.qll) from the [CodeQL Coding Standards repository](https://github.com/github/codeql-coding-standards). + + + + +# Session/Workshop notes + +Unlike the the [exercises](../README.md#org3b74422) which use the *collection* of test problems in `exercises-test`, this workshop is a sequential session following the actual process of writing CodeQL: use a *single* database built from a single, larger segment of code and inspect the query results as you write the query. + +For this workshop, the larger segment of code is still simplified skeleton code, not a full source code repository. + +The queries are embedded in \`session.md\` but can also be found in the \`example\*.ql\` files. They can all be run as test cases in VS Code. + + + + +## Step 1 + +In the first step we are going to + +1. identify a dynamic allocation with `malloc` and +2. an access to that allocated buffer. The access is via an array expression; we are **not** going to cover pointer dereferencing. + +The goal of this exercise is to then output the array access, array size, buffer, and buffer offset. + +The focus here is on + + void test_const(void) + +and + + void test_const_var(void) + +in [db.c](file:///Users/hohn/local/codeql-workshop-runtime-values-c/session-db/DB/db.c). + + + + +### Hints + +1. `Expr::getValue()::toInt()` can be used to get the integer value of a constant expression. + + + + +### Solution + +```java +import cpp +import semmle.code.cpp.dataflow.DataFlow + +from AllocationExpr buffer, ArrayExpr access, int bufferSize, int accessIdx, Expr allocSizeExpr +where + // malloc (100) + // ^^^^^^ AllocationExpr buffer + // + // buf[...] + // ^^^ ArrayExpr access + // + // buf[...] + // ^^^ int accessIdx + // + accessIdx = access.getArrayOffset().getValue().toInt() and + // + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // + allocSizeExpr = buffer.(Call).getArgument(0) and + bufferSize = allocSizeExpr.getValue().toInt() +select buffer, access, accessIdx, access.getArrayOffset(), bufferSize, allocSizeExpr +``` + +This produces 12 results, with some cross-function pairs. + + + + +## Step 2 + +The previous query fails to connect the `malloc` calls with the array accesses, and in the results, `mallocs` from one function are paired with accesses in another. + +To address these, take the query from the previous exercise and + +1. connect the allocation(s) with the +2. array accesses + + + + +### Hints + +1. Use `DataFlow::localExprFlow()` to relate the allocated buffer to the array base. +2. The the array base is the `buf` part of `buf[0]`. Use the `Expr.getArrayBase()` predicate. + + + + +### Solution + +```java +import cpp +import semmle.code.cpp.dataflow.DataFlow + +// Step 2 +// void test_const(void) +// void test_const_var(void) +from AllocationExpr buffer, ArrayExpr access, int bufferSize, int accessIdx, Expr allocSizeExpr +where + // malloc (100) + // ^^^^^^ AllocationExpr buffer + // + // buf[...] + // ^^^ ArrayExpr access + // + // buf[...] + // ^^^ int accessIdx + // + accessIdx = access.getArrayOffset().getValue().toInt() and + // + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // + allocSizeExpr = buffer.(Call).getArgument(0) and + bufferSize = allocSizeExpr.getValue().toInt() and + // char *buf = ... buf[0]; + // ^^^ ---> ^^^ + // or + // malloc(100); buf[0] + // ^^^ --------> ^^^ + // + DataFlow::localExprFlow(buffer, access.getArrayBase()) +select buffer, access, accessIdx, access.getArrayOffset(), bufferSize, allocSizeExpr +``` + + + + +### Results + +There are now 3 results. These are from only one function, the one using constants. + + + + +## Step 3 + +The previous results need to be extended to the case + +```c++ +void test_const_var(void) +{ + unsigned long size = 100; + char *buf = malloc(size); + buf[0]; // COMPLIANT + ... +} +``` + +Here, the `malloc` argument is a variable with known value. + +We include this result by removing the size-retrieval from the prior query. + + + + +### Solution + +```java +import cpp +import semmle.code.cpp.dataflow.DataFlow + +// Step 3 +// void test_const_var(void) +from AllocationExpr buffer, ArrayExpr access, int accessIdx, Expr allocSizeExpr +where + // malloc (100) + // ^^^^^^ AllocationExpr buffer + // + // buf[...] + // ^^^ ArrayExpr access + // + // buf[...] + // ^^^ int accessIdx + // + accessIdx = access.getArrayOffset().getValue().toInt() and + // + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // + allocSizeExpr = buffer.(Call).getArgument(0) and + // bufferSize = allocSizeExpr.getValue().toInt() and + // char *buf = ... buf[0]; + // ^^^ ---> ^^^ + // or + // malloc(100); buf[0] + // ^^^ --------> ^^^ + // + DataFlow::localExprFlow(buffer, access.getArrayBase()) +select buffer, access, accessIdx, access.getArrayOffset() +``` + + + + +### Results + +Now, we get 12 results, including some from other test cases. + + + + +## Step 4 + +We are looking for out-of-bounds accesses, so we to need to include the bounds. But in a more general way than looking only at constant values. + +Note the results for the cases in `test_const_var` which involve a variable access rather than a constant. The next goal is + +1. to handle the case where the allocation size or array index are variables (with constant values) rather than integer constants. + +We have an expression `size` that flows into the `malloc()` call. + + + + +### Hint + + + + +### Solution + +```java +import cpp +import semmle.code.cpp.dataflow.DataFlow + +// Step 4 +from AllocationExpr buffer, ArrayExpr access, int accessIdx, Expr allocSizeExpr, int bufferSize, Expr bse +where + // malloc (100) + // ^^^^^^^^^^^^ AllocationExpr buffer + // + // buf[...] + // ^^^ ArrayExpr access + // + // buf[...] + // ^^^ int accessIdx + // + accessIdx = access.getArrayOffset().getValue().toInt() and + // + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // + allocSizeExpr = buffer.(Call).getArgument(0) and + // bufferSize = allocSizeExpr.getValue().toInt() and + // + // unsigned long size = 100; + // ... + // char *buf = malloc(size); + exists(Expr bufferSizeExpr | + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + bufferSizeExpr.getValue().toInt() = bufferSize + and bse = bufferSizeExpr + ) and + // char *buf = ... buf[0]; + // ^^^ ---> ^^^ + // or + // malloc(100); buf[0] + // ^^^ --------> ^^^ + // + DataFlow::localExprFlow(buffer, access.getArrayBase()) +select buffer, access, accessIdx, access.getArrayOffset(), bufferSize, bse +``` + + + + +### Results + +Now, we get 15 results, limited to statically determined values. + +XX: Implement predicates `getSourceConstantExpr`, `getFixedSize`, and `getFixedArrayOffset` Use local data-flow analysis to complete the `getSourceConstantExpr` predicate. The `getFixedSize` and `getFixedArrayOffset` predicates can be completed using `getSourceConstantExpr`. + +XX: + +1. start with query. `elementSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize()` +2. convert to predicate. +3. then use classes, if desired. `class BufferAccess extends ArrayExpr` is different from those below. + + + + +## Step 5 – SimpleRangeAnalysis + +Running the query from Step 2 against the database yields a significant number of missing or incorrect results. The reason is that although great at identifying compile-time constants and their use, data-flow analysis is not always the right tool for identifying the *range* of values an `Expr` might have, particularly when multiple potential constants might flow to an `Expr`. + +The range analysis already handles conditional branches; we don't have to use guards on data flow – don't implement your own interpreter if you can use the library. + +The CodeQL standard library has several mechanisms for addressing this problem; in the remainder of this workshop we will explore two of them: `SimpleRangeAnalysis` and, later, `GlobalValueNumbering`. + +Although not in the scope of this workshop, a standard use-case for range analysis is reliably identifying integer overflow and validating integer overflow checks. + +First, simplify the `from...where...select`: + +1. Remove unnecessary `exists` clauses. +2. Use `DataFlow::localExprFlow` for the buffer and allocation sizes, not `getValue().toInt()` + +Then, add the use of the `SimpleRangeAnalysis` library. Specifically, the relevant library predicates are `upperBound` and `lowerBound`, to be used with the buffer access argument. Experiment and decide which to use for this exercise (`upperBound`, `lowerBound`, or both). + +This requires the import + + import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis + + + + +### Solution + +```java +import cpp +import semmle.code.cpp.dataflow.DataFlow +import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis + +// Step 5 +from + AllocationExpr buffer, ArrayExpr access, Expr accessIdx, Expr allocSizeExpr, int bufferSize, + int allocsize, Expr bufferSizeExpr +where + // malloc (100) + // ^^^^^^^^^^^^ AllocationExpr buffer + // + // buf[...] + // ^^^ ArrayExpr access + // + // buf[...] + // ^^^ int accessIdx + // + accessIdx = access.getArrayOffset() and + // + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // + // Not really: + // allocSizeExpr = buffer.(Call).getArgument(0) and + // + DataFlow::localExprFlow(allocSizeExpr, buffer.(Call).getArgument(0)) and + allocsize = allocSizeExpr.getValue().toInt() and + // + // unsigned long size = 100; + // ... + // char *buf = malloc(size); + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + bufferSizeExpr.getValue().toInt() = bufferSize and + // char *buf = ... buf[0]; + // ^^^ ---> ^^^ + // or + // malloc(100); buf[0] + // ^^^ --------> ^^^ + // + DataFlow::localExprFlow(buffer, access.getArrayBase()) +select buffer, bufferSizeExpr, access, upperBound(accessIdx) as accessMax, accessIdx, allocsize, allocSizeExpr +``` + + + + +### Results + +Now, we get 48 results. + + + + +## Step 6 + +To finally determine (some) out-of-bounds accesses, we have to convert allocation units (usually in bytes) to size units. Then we are finally in a position to compare buffer allocation size to the access index to find out-of-bounds accesses – at least for expressions with known values. + +Add these to the query: + +1. Convert allocation units to size units. +2. Convert access units to the same size units. + +Hints: + +1. We need the size of the array element. Use `access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType()` to see the type and `access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize()` to get its size. + +2. Note from the docs: *The malloc() function allocates size bytes of memory and returns a pointer to the allocated memory.* So `size = 1` + +3. Note that `allocSizeExpr.getUnspecifiedType() as allocBaseType` is wrong here. + +4. These test cases all use type `char`. What would happen for `int` or `double`? + + + + +### Solution + +```java +import cpp +import semmle.code.cpp.dataflow.DataFlow +import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis + +// Step 6 +from + AllocationExpr buffer, ArrayExpr access, Expr accessIdx, Expr allocSizeExpr, int bufferSize, + int allocsize, Expr bufferSizeExpr +where + // malloc (100) + // ^^^^^^^^^^^^ AllocationExpr buffer + // + // buf[...] + // ^^^ ArrayExpr access + // buf[...] + // ^^^ int accessIdx + accessIdx = access.getArrayOffset() and + // + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // + // Not really: + // allocSizeExpr = buffer.(Call).getArgument(0) and + // + DataFlow::localExprFlow(allocSizeExpr, buffer.(Call).getArgument(0)) and + allocsize = allocSizeExpr.getValue().toInt() and + // + // unsigned long size = 100; + // ... + // char *buf = malloc(size); + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + bufferSizeExpr.getValue().toInt() = bufferSize and + // char *buf = ... buf[0]; + // ^^^ ---> ^^^ + // or + // malloc(100); buf[0] + // ^^^ --------> ^^^ + // + DataFlow::localExprFlow(buffer, access.getArrayBase()) +select buffer, bufferSizeExpr, access, upperBound(accessIdx) as accessMax, accessIdx, allocsize, allocSizeExpr, access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType() as arrayBaseType, access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() as arrayTypeSize, 1 as allocBaseSize +``` + + + + +### Results + +48 results in the table + +| | | | | | | | | | | | +|--- |-------------- |--- |--------------- |--- |--- |--- |--- |---- |--- |--- | +| 1 | call to malloc | 200 | access to array | 0 | 0 | 200 | 200 | char | 1 | 1 | + + + + +## Step 7 + +1. Clean up the query. +2. Add expressions for `allocatedUnits` (from the malloc) and a `maxAccessedIndex` (from array accesses) +3. Compare buffer allocation size to the access index. + + + + +### Solution: + +```java +import cpp +import semmle.code.cpp.dataflow.DataFlow +import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis + +// Step 7 +from + AllocationExpr buffer, ArrayExpr access, Expr accessIdx, Expr allocSizeExpr, int bufferSize, + int allocsize, Expr bufferSizeExpr, int arrayTypeSize, int allocBaseSize +where + // malloc (100) + // ^^^^^^^^^^^^ AllocationExpr buffer + // + // buf[...] + // ^^^ ArrayExpr access + // buf[...] + // ^^^ int accessIdx + accessIdx = access.getArrayOffset() and + // + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // + // Not really: + // allocSizeExpr = buffer.(Call).getArgument(0) and + // + DataFlow::localExprFlow(allocSizeExpr, buffer.(Call).getArgument(0)) and + allocsize = allocSizeExpr.getValue().toInt() and + // + // unsigned long size = 100; + // ... + // char *buf = malloc(size); + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + bufferSizeExpr.getValue().toInt() = bufferSize and + // char *buf = ... buf[0]; + // ^^^ ---> ^^^ + // or + // malloc(100); buf[0] + // ^^^ --------> ^^^ + // + arrayTypeSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() + and + 1 = allocBaseSize + and + DataFlow::localExprFlow(buffer, access.getArrayBase()) +select buffer, bufferSizeExpr, access, upperBound(accessIdx) as accessMax, allocSizeExpr, allocBaseSize * allocsize as allocatedUnits, arrayTypeSize * accessMax as maxAccessedIndex +``` + + + + +### Results + +48 results in the much cleaner table + +| no. | buffer | bufferSizeExpr | access | accessMax | allocSizeExpr | allocatedUnits | maxAccessedIndex | | +| 1 | call to malloc | 200 | access to array | 0 | 200 | 200 | 0 | | + + + + +## Step 8 + +1. Clean up the query. +2. Compare buffer allocation size to the access index. +3. Report only the questionable entries. +4. Use + + ```java + /** + * @kind problem + */ + ``` + + to get nicer reporting. + + + + +### Solution: + +```java +/** + * @kind problem + */ + +import cpp +import semmle.code.cpp.dataflow.DataFlow +import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis + +// Step 8 +from + AllocationExpr buffer, ArrayExpr access, Expr accessIdx, Expr allocSizeExpr, int bufferSize, + int allocsize, Expr bufferSizeExpr, int arrayTypeSize, int allocBaseSize, int accessMax, + int allocatedUnits, int maxAccessedIndex +where + // malloc (100) + // ^^^^^^^^^^^^ AllocationExpr buffer + // + // buf[...] + // ^^^ ArrayExpr access + // buf[...] + // ^^^ int accessIdx + accessIdx = access.getArrayOffset() and + // + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // + // Not really: + // allocSizeExpr = buffer.(Call).getArgument(0) and + // + DataFlow::localExprFlow(allocSizeExpr, buffer.(Call).getArgument(0)) and + allocsize = allocSizeExpr.getValue().toInt() and + // + // unsigned long size = 100; + // ... + // char *buf = malloc(size); + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + bufferSizeExpr.getValue().toInt() = bufferSize and + // char *buf = ... buf[0]; + // ^^^ ---> ^^^ + // or + // malloc(100); buf[0] + // ^^^ --------> ^^^ + // + arrayTypeSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() and + 1 = allocBaseSize and + DataFlow::localExprFlow(buffer, access.getArrayBase()) and + upperBound(accessIdx) = accessMax and + allocBaseSize * allocsize = allocatedUnits and + arrayTypeSize * accessMax = maxAccessedIndex and + // only consider out-of-bounds + maxAccessedIndex >= allocatedUnits +select access, "Array access at or beyond size; have "+allocatedUnits + " units, access at "+ maxAccessedIndex +``` + + + + +### Results + +14 results in the much cleaner table + +| | | +|------------------------------------------------------------- |--------- | +| Array access at or beyond size; have 200 units, access at 200 | db.c:67:5 | + + + + +## Interim notes + +A common issue with the `SimpleRangeAnalysis` library is handling of cases where the bounds are undeterminable at compile-time on one or more paths. For example, even though certain paths have clearly defined bounds, the range analysis library will define the `upperBound` and `lowerBound` of `val` as `INT_MIN` and `INT_MAX` respectively: + +```cpp +int val = rand() ? rand() : 30; +``` + +A similar case is present in the `test_const_branch` and `test_const_branch2` test-cases. In these cases, it is necessary to augment range analysis with data-flow and restrict the bounds to the upper or lower bound of computable constants that flow to a given expression. Another approach is global value numbering, used next. + + + + +## Step 9 – GlobalValueNumbering + +Range analyis won't bound `sz * x * y`, so switch to global value numbering. This is the case in the last test case, + + void test_gvn_var(unsigned long x, unsigned long y, unsigned long sz) + { + char *buf = malloc(sz * x * y); + buf[sz * x * y - 1]; // COMPLIANT + buf[sz * x * y]; // NON_COMPLIANT + buf[sz * x * y + 1]; // NON_COMPLIANT + } + +Reference: + +Global value numbering only knows that runtime values are equal; they are not comparable (`<, >, <=` etc.), and the *actual* value is not known. + +XX: global value numbering finds expressions with the same known value, independent of structure. + +So, we look for and use *relative* values between allocation and use. To do this, use GVN. + +The relevant CodeQL constructs are + +```java +import semmle.code.cpp.valuenumbering.GlobalValueNumbering +... +globalValueNumber(e) = globalValueNumber(sizeExpr) and +e != sizeExpr +... +``` + +We can use global value numbering to identify common values as first step, but for expressions like + + buf[sz * x * y - 1]; // COMPLIANT + +we have to "evaluate" the expressions – or at least bound them. + + + + +### interim + +```java +/** + * @ kind problem + */ + +import cpp +import semmle.code.cpp.dataflow.DataFlow +import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis +import semmle.code.cpp.valuenumbering.GlobalValueNumbering + +// Step 9 +from + AllocationExpr buffer, ArrayExpr access, Expr accessIdx, Expr allocSizeExpr, int bufferSize, + int allocsize, Expr bufferSizeExpr, int arrayTypeSize, int allocBaseSize, int accessMax, + int allocatedUnits, int maxAccessedIndex +where + // malloc (100) + // ^^^^^^^^^^^^ AllocationExpr buffer + // + // buf[...] + // ^^^ ArrayExpr access + // buf[...] + // ^^^ int accessIdx + accessIdx = access.getArrayOffset() and + // + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // + // Not really: + // allocSizeExpr = buffer.(Call).getArgument(0) and + // + DataFlow::localExprFlow(allocSizeExpr, buffer.(Call).getArgument(0)) and + allocsize = allocSizeExpr.getValue().toInt() and + // + // unsigned long size = 100; + // ... + // char *buf = malloc(size); + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + bufferSizeExpr.getValue().toInt() = bufferSize and + // char *buf = ... buf[0]; + // ^^^ ---> ^^^ + // or + // malloc(100); buf[0] + // ^^^ --------> ^^^ + // + arrayTypeSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() and + 1 = allocBaseSize and + DataFlow::localExprFlow(buffer, access.getArrayBase()) and + upperBound(accessIdx) = accessMax and + allocBaseSize * allocsize = allocatedUnits and + arrayTypeSize * accessMax = maxAccessedIndex and + // only consider out-of-bounds + maxAccessedIndex >= allocatedUnits +select access, + "Array access at or beyond size; have " + allocatedUnits + " units, access at " + maxAccessedIndex, + globalValueNumber(accessIdx) as gvnAccess, globalValueNumber(allocSizeExpr) as gvnAlloc +``` + + + + +### interim + +Messy, start over. + +```java +import cpp +import semmle.code.cpp.dataflow.DataFlow +import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis +import semmle.code.cpp.valuenumbering.GlobalValueNumbering + +// Step 9 +from + AllocationExpr buffer, ArrayExpr access, Expr accessIdx, Expr allocSizeExpr, GVN gvnAccess, + GVN gvnAlloc +where + // malloc (100) + // ^^^^^^^^^^^^ AllocationExpr buffer + // + // buf[...] + // ^^^ ArrayExpr access + // buf[...] + // ^^^ accessIdx + accessIdx = access.getArrayOffset() and + // + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // unsigned long size = 100; + // ... + // char *buf = malloc(size); + DataFlow::localExprFlow(allocSizeExpr, buffer.getSizeExpr()) and + // char *buf = ... buf[0]; + // ^^^ ---> ^^^ + // or + // malloc(100); buf[0] + // ^^^ --------> ^^^ + // + DataFlow::localExprFlow(buffer, access.getArrayBase()) and + // + // Use GVN + globalValueNumber(accessIdx) = gvnAccess and + globalValueNumber(allocSizeExpr) = gvnAlloc and + ( + gvnAccess = gvnAlloc + or + // buf[sz * x * y] above + // buf[sz * x * y + 1]; + exists(AddExpr add | + accessIdx = add and + // add.getAnOperand() = accessIdx and + add.getAnOperand().getValue().toInt() > 0 and + globalValueNumber(add.getAnOperand()) = gvnAlloc + ) + ) +select access, gvnAccess, gvnAlloc +``` + + + + +## TODO hashcons + +import semmle.code.cpp.valuenumbering.HashCons + +hashcons: every value gets a number based on structure. Fails on + + char *buf = malloc(sz * x * y); + sz = 100; + buf[sz * x * y - 1]; // COMPLIANT + +The final exercise is to implement the `isOffsetOutOfBoundsGVN` predicate to […] diff --git a/session/session.org b/session/session.org index 1296081..5d1e3a3 100644 --- a/session/session.org +++ b/session/session.org @@ -132,22 +132,27 @@ Standards repository]]. :END: Unlike the the [[../README.md#org3b74422][exercises]] which use the /collection/ of test problems in -=exercises-test=, this workshop is a sequential session as one following the -actual process of writing CodeQL: use a /single/ database built from a single, -larger segment of code. For this workshop, the larger segment is still simplified -skeleton code, not a full source code repository. +=exercises-test=, this workshop is a sequential session following the actual +process of writing CodeQL: use a /single/ database built from a single, larger +segment of code and inspect the query results as you write the query. + +For this workshop, the larger segment of code is still simplified skeleton code, +not a full source code repository. + +The queries are embedded in `session.md` but can also be found in the +`example*.ql` files. They can all be run as test cases in VS Code. ** Step 1 :PROPERTIES: :CUSTOM_ID: exercise-1 :END: - In the first step we are going to start by - 1. modelling a dynamic allocation with =malloc= and - 2. an access to that allocated buffer with an - 3. array expression. + In the first step we are going to + 1. identify a dynamic allocation with =malloc= and + 2. an access to that allocated buffer. The access is via an array expression; + we are *not* going to cover pointer dereferencing. - The goal of this exercise is to then output the array access, buffer, array size, - and buffer offset. + The goal of this exercise is to then output the array access, array size, + buffer, and buffer offset. The focus here is on : void test_const(void) @@ -163,36 +168,14 @@ skeleton code, not a full source code repository. constant expression. *** Solution - #+BEGIN_SRC java :tangle example1.ql - import cpp - import semmle.code.cpp.dataflow.DataFlow - - from AllocationExpr buffer, ArrayExpr access, int bufferSize, int accessIdx, Expr allocSizeExpr - where - // malloc (100) - // ^^^^^^ AllocationExpr buffer - // - // buf[...] - // ^^^ ArrayExpr access - // - // buf[...] - // ^^^ int accessIdx - // - accessIdx = access.getArrayOffset().getValue().toInt() and - // - // malloc (100) - // ^^^ allocSizeExpr / bufferSize - // - allocSizeExpr = buffer.(Call).getArgument(0) and - bufferSize = allocSizeExpr.getValue().toInt() - select buffer, access, accessIdx, access.getArrayOffset(), bufferSize, allocSizeExpr - #+END_SRC + #+INCLUDE: "example1.ql" src java This produces 12 results, with some cross-function pairs. ** Step 2 The previous query fails to connect the =malloc= calls with the array accesses, -and =mallocs= from one function are paired with accesses in another. +and in the results, =mallocs= from one function are paired with accesses in +another. To address these, take the query from the previous exercise and 1. connect the allocation(s) with the @@ -209,40 +192,7 @@ To address these, take the query from the previous exercise and *** Solution - #+BEGIN_SRC java :tangle example2.ql - import cpp - import semmle.code.cpp.dataflow.DataFlow - - // Step 2 - // void test_const(void) - // void test_const_var(void) - from AllocationExpr buffer, ArrayExpr access, int bufferSize, int accessIdx, Expr allocSizeExpr - where - // malloc (100) - // ^^^^^^ AllocationExpr buffer - // - // buf[...] - // ^^^ ArrayExpr access - // - // buf[...] - // ^^^ int accessIdx - // - accessIdx = access.getArrayOffset().getValue().toInt() and - // - // malloc (100) - // ^^^ allocSizeExpr / bufferSize - // - allocSizeExpr = buffer.(Call).getArgument(0) and - bufferSize = allocSizeExpr.getValue().toInt() and - // char *buf = ... buf[0]; - // ^^^ ---> ^^^ - // or - // malloc(100); buf[0] - // ^^^ --------> ^^^ - // - DataFlow::localExprFlow(buffer, access.getArrayBase()) - select buffer, access, accessIdx, access.getArrayOffset(), bufferSize, allocSizeExpr - #+END_SRC + #+INCLUDE: "example2.ql" src java *** Results There are now 3 results. These are from only one function, the one using constants. @@ -269,115 +219,43 @@ To address these, take the query from the previous exercise and *** Solution - #+BEGIN_SRC java :tangle example3.ql - - import cpp - import semmle.code.cpp.dataflow.DataFlow - - // Step 3 - // void test_const_var(void) - from AllocationExpr buffer, ArrayExpr access, int accessIdx, Expr allocSizeExpr - where - // malloc (100) - // ^^^^^^ AllocationExpr buffer - // - // buf[...] - // ^^^ ArrayExpr access - // - // buf[...] - // ^^^ int accessIdx - // - accessIdx = access.getArrayOffset().getValue().toInt() and - // - // malloc (100) - // ^^^ allocSizeExpr / bufferSize - // - allocSizeExpr = buffer.(Call).getArgument(0) and - // bufferSize = allocSizeExpr.getValue().toInt() and - // char *buf = ... buf[0]; - // ^^^ ---> ^^^ - // or - // malloc(100); buf[0] - // ^^^ --------> ^^^ - // - DataFlow::localExprFlow(buffer, access.getArrayBase()) - select buffer, access, accessIdx, access.getArrayOffset() - #+END_SRC + #+INCLUDE: "example3.ql" src java *** Results Now, we get 12 results, including some from other test cases. ** Step 4 We are looking for out-of-bounds accesses, so we to need to include the - bounds. But in a different way. + bounds. But in a more general way than looking only at constant values. Note the results for the cases in =test_const_var= which involve a variable - access rather than a constant. The next goal is to handle the case where the - allocation size or array index are variables (with constant values) rather than - integer constants. + access rather than a constant. The next goal is + 1. to handle the case where the allocation size or array index are variables + (with constant values) rather than integer constants. We have an expression =size= that flows into the =malloc()= call. +*** Hint + *** Solution - #+BEGIN_SRC java :tangle example4.ql - import cpp - import semmle.code.cpp.dataflow.DataFlow - - // Step 4 - from AllocationExpr buffer, ArrayExpr access, int accessIdx, Expr allocSizeExpr, int bufferSize, Expr bse - where - // malloc (100) - // ^^^^^^^^^^^^ AllocationExpr buffer - // - // buf[...] - // ^^^ ArrayExpr access - // - // buf[...] - // ^^^ int accessIdx - // - accessIdx = access.getArrayOffset().getValue().toInt() and - // - // malloc (100) - // ^^^ allocSizeExpr / bufferSize - // - allocSizeExpr = buffer.(Call).getArgument(0) and - // bufferSize = allocSizeExpr.getValue().toInt() and - // - // unsigned long size = 100; - // ... - // char *buf = malloc(size); - exists(Expr bufferSizeExpr | - DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and - bufferSizeExpr.getValue().toInt() = bufferSize - and bse = bufferSizeExpr - ) and - // char *buf = ... buf[0]; - // ^^^ ---> ^^^ - // or - // malloc(100); buf[0] - // ^^^ --------> ^^^ - // - DataFlow::localExprFlow(buffer, access.getArrayBase()) - select buffer, access, accessIdx, access.getArrayOffset(), bufferSize, bse - #+END_SRC + #+INCLUDE: "example4.ql" src java *** Results Now, we get 15 results, limited to statically determined values. - XX: to implement predicates - =getSourceConstantExpr=, =getFixedSize=, and =getFixedArrayOffset= - Use local data-flow analysis to complete the =getSourceConstantExpr= - predicate. The =getFixedSize= and =getFixedArrayOffset= predicates can - be completed using =getSourceConstantExpr=. + XX: Implement predicates + =getSourceConstantExpr=, =getFixedSize=, and =getFixedArrayOffset= + Use local data-flow analysis to complete the =getSourceConstantExpr= + predicate. The =getFixedSize= and =getFixedArrayOffset= predicates can + be completed using =getSourceConstantExpr=. -XX: - 1. start with query. - =elementSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize()= - 2. convert to predicate. - 3. then use classes, if desired. =class BufferAccess extends ArrayExpr= - is different from those below. - + XX: + 1. start with query. + =elementSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize()= + 2. convert to predicate. + 3. then use classes, if desired. =class BufferAccess extends ArrayExpr= + is different from those below. ** Step 5 -- SimpleRangeAnalysis Running the query from Step 2 against the database yields a @@ -413,50 +291,7 @@ XX: : import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis *** Solution - #+BEGIN_SRC java :tangle example5.ql - import cpp - import semmle.code.cpp.dataflow.DataFlow - import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis - - // Step 5 - from - AllocationExpr buffer, ArrayExpr access, Expr accessIdx, Expr allocSizeExpr, int bufferSize, - int allocsize, Expr bufferSizeExpr - where - // malloc (100) - // ^^^^^^^^^^^^ AllocationExpr buffer - // - // buf[...] - // ^^^ ArrayExpr access - // - // buf[...] - // ^^^ int accessIdx - // - accessIdx = access.getArrayOffset() and - // - // malloc (100) - // ^^^ allocSizeExpr / bufferSize - // - // Not really: - // allocSizeExpr = buffer.(Call).getArgument(0) and - // - DataFlow::localExprFlow(allocSizeExpr, buffer.(Call).getArgument(0)) and - allocsize = allocSizeExpr.getValue().toInt() and - // - // unsigned long size = 100; - // ... - // char *buf = malloc(size); - DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and - bufferSizeExpr.getValue().toInt() = bufferSize and - // char *buf = ... buf[0]; - // ^^^ ---> ^^^ - // or - // malloc(100); buf[0] - // ^^^ --------> ^^^ - // - DataFlow::localExprFlow(buffer, access.getArrayBase()) - select buffer, bufferSizeExpr, access, upperBound(accessIdx) as accessMax, accessIdx, allocsize, allocSizeExpr - #+END_SRC + #+INCLUDE: "example5.ql" src java *** Results Now, we get 48 results. @@ -491,48 +326,7 @@ XX: =double=? *** Solution - #+BEGIN_SRC java :tangle example6.ql - import cpp - import semmle.code.cpp.dataflow.DataFlow - import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis - - // Step 6 - from - AllocationExpr buffer, ArrayExpr access, Expr accessIdx, Expr allocSizeExpr, int bufferSize, - int allocsize, Expr bufferSizeExpr - where - // malloc (100) - // ^^^^^^^^^^^^ AllocationExpr buffer - // - // buf[...] - // ^^^ ArrayExpr access - // buf[...] - // ^^^ int accessIdx - accessIdx = access.getArrayOffset() and - // - // malloc (100) - // ^^^ allocSizeExpr / bufferSize - // - // Not really: - // allocSizeExpr = buffer.(Call).getArgument(0) and - // - DataFlow::localExprFlow(allocSizeExpr, buffer.(Call).getArgument(0)) and - allocsize = allocSizeExpr.getValue().toInt() and - // - // unsigned long size = 100; - // ... - // char *buf = malloc(size); - DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and - bufferSizeExpr.getValue().toInt() = bufferSize and - // char *buf = ... buf[0]; - // ^^^ ---> ^^^ - // or - // malloc(100); buf[0] - // ^^^ --------> ^^^ - // - DataFlow::localExprFlow(buffer, access.getArrayBase()) - select buffer, bufferSizeExpr, access, upperBound(accessIdx) as accessMax, accessIdx, allocsize, allocSizeExpr, access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType() as arrayBaseType, access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() as arrayTypeSize, 1 as allocBaseSize - #+END_SRC + #+INCLUDE: "example6.ql" src java *** Results 48 results in the table @@ -546,52 +340,7 @@ XX: 3. Compare buffer allocation size to the access index. *** Solution: - #+BEGIN_SRC java :tangle example7.ql - import cpp - import semmle.code.cpp.dataflow.DataFlow - import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis - - // Step 7 - from - AllocationExpr buffer, ArrayExpr access, Expr accessIdx, Expr allocSizeExpr, int bufferSize, - int allocsize, Expr bufferSizeExpr, int arrayTypeSize, int allocBaseSize - where - // malloc (100) - // ^^^^^^^^^^^^ AllocationExpr buffer - // - // buf[...] - // ^^^ ArrayExpr access - // buf[...] - // ^^^ int accessIdx - accessIdx = access.getArrayOffset() and - // - // malloc (100) - // ^^^ allocSizeExpr / bufferSize - // - // Not really: - // allocSizeExpr = buffer.(Call).getArgument(0) and - // - DataFlow::localExprFlow(allocSizeExpr, buffer.(Call).getArgument(0)) and - allocsize = allocSizeExpr.getValue().toInt() and - // - // unsigned long size = 100; - // ... - // char *buf = malloc(size); - DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and - bufferSizeExpr.getValue().toInt() = bufferSize and - // char *buf = ... buf[0]; - // ^^^ ---> ^^^ - // or - // malloc(100); buf[0] - // ^^^ --------> ^^^ - // - arrayTypeSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() - and - 1 = allocBaseSize - and - DataFlow::localExprFlow(buffer, access.getArrayBase()) - select buffer, bufferSizeExpr, access, upperBound(accessIdx) as accessMax, allocSizeExpr, allocBaseSize * allocsize as allocatedUnits, arrayTypeSize * accessMax as maxAccessedIndex - #+END_SRC + #+INCLUDE: "example7.ql" src java *** Results 48 results in the much cleaner table @@ -612,60 +361,7 @@ XX: to get nicer reporting. *** Solution: - #+BEGIN_SRC java :tangle example8.ql - /** - ,* @kind problem - ,*/ - - import cpp - import semmle.code.cpp.dataflow.DataFlow - import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis - - // Step 8 - from - AllocationExpr buffer, ArrayExpr access, Expr accessIdx, Expr allocSizeExpr, int bufferSize, - int allocsize, Expr bufferSizeExpr, int arrayTypeSize, int allocBaseSize, int accessMax, - int allocatedUnits, int maxAccessedIndex - where - // malloc (100) - // ^^^^^^^^^^^^ AllocationExpr buffer - // - // buf[...] - // ^^^ ArrayExpr access - // buf[...] - // ^^^ int accessIdx - accessIdx = access.getArrayOffset() and - // - // malloc (100) - // ^^^ allocSizeExpr / bufferSize - // - // Not really: - // allocSizeExpr = buffer.(Call).getArgument(0) and - // - DataFlow::localExprFlow(allocSizeExpr, buffer.(Call).getArgument(0)) and - allocsize = allocSizeExpr.getValue().toInt() and - // - // unsigned long size = 100; - // ... - // char *buf = malloc(size); - DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and - bufferSizeExpr.getValue().toInt() = bufferSize and - // char *buf = ... buf[0]; - // ^^^ ---> ^^^ - // or - // malloc(100); buf[0] - // ^^^ --------> ^^^ - // - arrayTypeSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() and - 1 = allocBaseSize and - DataFlow::localExprFlow(buffer, access.getArrayBase()) and - upperBound(accessIdx) = accessMax and - allocBaseSize * allocsize = allocatedUnits and - arrayTypeSize * accessMax = maxAccessedIndex and - // only consider out-of-bounds - maxAccessedIndex >= allocatedUnits - select access, "Array access at or beyond size; have "+allocatedUnits + " units, access at "+ maxAccessedIndex - #+END_SRC + #+INCLUDE: "example8.ql" src java *** Results 14 results in the much cleaner table @@ -794,58 +490,7 @@ XX: *** interim Messy, start over. - #+BEGIN_SRC java - - import cpp - import semmle.code.cpp.dataflow.DataFlow - import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis - import semmle.code.cpp.valuenumbering.GlobalValueNumbering - - // Step 9 - from - AllocationExpr buffer, ArrayExpr access, Expr accessIdx, Expr allocSizeExpr, GVN gvnAccess, - GVN gvnAlloc - where - // malloc (100) - // ^^^^^^^^^^^^ AllocationExpr buffer - // - // buf[...] - // ^^^ ArrayExpr access - // buf[...] - // ^^^ accessIdx - accessIdx = access.getArrayOffset() and - // - // malloc (100) - // ^^^ allocSizeExpr / bufferSize - // unsigned long size = 100; - // ... - // char *buf = malloc(size); - DataFlow::localExprFlow(allocSizeExpr, buffer.getSizeExpr()) and - // char *buf = ... buf[0]; - // ^^^ ---> ^^^ - // or - // malloc(100); buf[0] - // ^^^ --------> ^^^ - // - DataFlow::localExprFlow(buffer, access.getArrayBase()) and - // - // Use GVN - globalValueNumber(accessIdx) = gvnAccess and - globalValueNumber(allocSizeExpr) = gvnAlloc and - ( - gvnAccess = gvnAlloc - or - // buf[sz * x * y] above - // buf[sz * x * y + 1]; - exists(AddExpr add | - accessIdx = add and - // add.getAnOperand() = accessIdx and - add.getAnOperand().getValue().toInt() > 0 and - globalValueNumber(add.getAnOperand()) = gvnAlloc - ) - ) - select access, gvnAccess, gvnAlloc - #+END_SRC + #+INCLUDE: "example9.ql" src java ** TODO hashcons import semmle.code.cpp.valuenumbering.HashCons From 5e1ad9e06d86b8cc4b124a07f1e45e6bc65c625b Mon Sep 17 00:00:00 2001 From: Nikita Kraiouchkine Date: Tue, 16 May 2023 17:09:11 +0200 Subject: [PATCH 10/28] Add WIP template for simplification Exercise6.ql has been simplified to use only predicates rather than classes RuntimeValues.qll now pre-implements getExprOffsetValue --- library/RuntimeValues.qll | 16 +++++++++ library/qlpack.yml | 6 ++++ session/session.md | 2 ++ solutions/Exercise6.ql | 68 ++++++++++----------------------------- solutions/qlpack.yml | 3 +- 5 files changed, 43 insertions(+), 52 deletions(-) create mode 100644 library/RuntimeValues.qll create mode 100644 library/qlpack.yml diff --git a/library/RuntimeValues.qll b/library/RuntimeValues.qll new file mode 100644 index 0000000..c3a37da --- /dev/null +++ b/library/RuntimeValues.qll @@ -0,0 +1,16 @@ +import cpp + +bindingset[expr] +int getExprOffsetValue(Expr expr, Expr base) { + result = expr.(AddExpr).getRightOperand().getValue().toInt() and + base = expr.(AddExpr).getLeftOperand() + or + result = -expr.(SubExpr).getRightOperand().getValue().toInt() and + base = expr.(SubExpr).getLeftOperand() + or + // currently only AddExpr and SubExpr are supported: else, fall-back to 0 + not expr instanceof AddExpr and + not expr instanceof SubExpr and + base = expr and + result = 0 +} diff --git a/library/qlpack.yml b/library/qlpack.yml new file mode 100644 index 0000000..6629dc7 --- /dev/null +++ b/library/qlpack.yml @@ -0,0 +1,6 @@ +--- +library: true +name: library +version: 0.0.1 +dependencies: + codeql/cpp-all: 0.6.1 \ No newline at end of file diff --git a/session/session.md b/session/session.md index ea6f975..16a5a5e 100644 --- a/session/session.md +++ b/session/session.md @@ -153,6 +153,8 @@ In the first step we are going to 1. identify a dynamic allocation with `malloc` and 2. an access to that allocated buffer. The access is via an array expression; we are **not** going to cover pointer dereferencing. +We are going to accomplish these tasks via predicates. + The goal of this exercise is to then output the array access, array size, buffer, and buffer offset. The focus here is on diff --git a/solutions/Exercise6.ql b/solutions/Exercise6.ql index f316e36..34516d9 100644 --- a/solutions/Exercise6.ql +++ b/solutions/Exercise6.ql @@ -9,6 +9,7 @@ import cpp import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis import semmle.code.cpp.dataflow.DataFlow import semmle.code.cpp.valuenumbering.GlobalValueNumbering +import RuntimeValues /** * Gets an expression that flows to `dest` and has a constant value. @@ -36,73 +37,38 @@ int getMaxStatedValue(Expr e) { result = upperBound(e).minimum(max(getSourceConstantExpr(e).getValue().toInt())) } -bindingset[expr] -int getExprOffsetValue(Expr expr, Expr base) { - result = expr.(AddExpr).getRightOperand().getValue().toInt() and - base = expr.(AddExpr).getLeftOperand() - or - result = -expr.(SubExpr).getRightOperand().getValue().toInt() and - base = expr.(SubExpr).getLeftOperand() - or - // currently only AddExpr and SubExpr are supported: else, fall-back to 0 - not expr instanceof AddExpr and - not expr instanceof SubExpr and - base = expr and - result = 0 +predicate allocatedBufferArrayAccess(ArrayExpr access, FunctionCall alloc) { + alloc.getTarget().hasName("malloc") and + DataFlow::localExprFlow(alloc, access.getArrayBase()) } -class AllocationCall extends FunctionCall { - AllocationCall() { this.getTarget() instanceof AllocationFunction } - - Expr getBuffer() { result = this } - - Expr getSizeExpr() { - // AllocationExpr may sometimes return a subexpression of the size expression - // in order to separate the size from a sizeof expression in a MulExpr. - exists(AllocationFunction f | - f = this.(FunctionCall).getTarget() and - result = this.(FunctionCall).getArgument(f.getSizeArg()) - ) - } - - int getFixedSize() { result = getMaxStatedValue(this.getSizeExpr()) } -} - -class AccessExpr extends ArrayExpr { - AllocationCall source; - - AccessExpr() { DataFlow::localExprFlow(source.getBuffer(), this.getArrayBase()) } - - AllocationCall getSource() { result = source } - - int getFixedArrayOffset() { - exists(Expr base, int offset | - offset = getExprOffsetValue(this.getArrayOffset(), base) and - result = getMaxStatedValue(base) + offset - ) - } +int getFixedArrayOffset(ArrayExpr access) { + exists(Expr base, int offset | + offset = getExprOffsetValue(access.getArrayOffset(), base) and + result = getMaxStatedValue(base) + offset + ) } predicate isOffsetOutOfBoundsConstant( - AccessExpr access, AllocationCall source, int allocSize, int accessOffset + ArrayExpr access, FunctionCall source, int allocSize, int accessOffset ) { - source = access.getSource() and - allocSize = source.getFixedSize() and - accessOffset = access.getFixedArrayOffset() and + allocatedBufferArrayAccess(access, source) and + allocSize = getMaxStatedValue(source.getArgument(0)) and + accessOffset = getFixedArrayOffset(access) and accessOffset >= allocSize } -predicate isOffsetOutOfBoundsGVN(AccessExpr access, AllocationCall source) { - source = access.getSource() and +predicate isOffsetOutOfBoundsGVN(ArrayExpr access, FunctionCall source) { + allocatedBufferArrayAccess(access, source) and not isOffsetOutOfBoundsConstant(access, source, _, _) and exists(Expr accessOffsetBase, int accessOffsetBaseValue | accessOffsetBaseValue = getExprOffsetValue(access.getArrayOffset(), accessOffsetBase) and - globalValueNumber(source.getSizeExpr()) = globalValueNumber(accessOffsetBase) and + globalValueNumber(source.getArgument(0)) = globalValueNumber(accessOffsetBase) and not accessOffsetBaseValue < 0 ) } -from AllocationCall source, AccessExpr access, string message +from FunctionCall source, ArrayExpr access, string message where exists(int allocSize, int accessOffset | isOffsetOutOfBoundsConstant(access, source, allocSize, accessOffset) and diff --git a/solutions/qlpack.yml b/solutions/qlpack.yml index b2d0347..8c53758 100644 --- a/solutions/qlpack.yml +++ b/solutions/qlpack.yml @@ -3,4 +3,5 @@ library: false name: solutions version: 0.0.1 dependencies: - codeql/cpp-all: 0.6.1 \ No newline at end of file + codeql/cpp-all: 0.6.1 + library: "*" \ No newline at end of file From 6695f07c819a5759d2152e1f1b987f40f6c0311f Mon Sep 17 00:00:00 2001 From: Michael Hohn Date: Tue, 16 May 2023 15:20:06 -0700 Subject: [PATCH 11/28] Expanded the "Session/Workshop notes" section --- session/session.org | 45 +++++++++++++++++++++++++++++++++------------ 1 file changed, 33 insertions(+), 12 deletions(-) diff --git a/session/session.org b/session/session.org index 5d1e3a3..1bd834d 100644 --- a/session/session.org +++ b/session/session.org @@ -131,16 +131,37 @@ Standards repository]]. :CUSTOM_ID: sessionworkshop-notes :END: -Unlike the the [[../README.md#org3b74422][exercises]] which use the /collection/ of test problems in -=exercises-test=, this workshop is a sequential session following the actual -process of writing CodeQL: use a /single/ database built from a single, larger -segment of code and inspect the query results as you write the query. - -For this workshop, the larger segment of code is still simplified skeleton code, -not a full source code repository. - -The queries are embedded in `session.md` but can also be found in the -`example*.ql` files. They can all be run as test cases in VS Code. + Unlike the the [[../README.md#org3b74422][exercises]] which use the /collection/ of test problems in + =exercises-test=, this workshop is a sequential session following the actual + process of writing CodeQL: use a /single/ database built from a single, larger + segment of code and inspect the query results as you write the query. + + For this workshop, the larger segment of code is still simplified skeleton code, + not a full source code repository. + + The queries are embedded in `session.md` but can also be found in the + `example*.ql` files. They can all be run as test cases in VS Code. + + To reiterate: + + This workshop focuses on analyzing and relating two /static/ values --- array + access indices and memory allocation sizes --- in order to identify + simple cases of out-of-bounds array accesses. We do not handle /dynamic/ values + but take advantage of special cases. + + To find these issues, + 1. We can implement an analysis that tracks the upper or lower bounds on an + expression. + 2. We then combine this with data-flow analysis to reduce false positives and + identify cases where the index of the array results in an access beyond the + allocated size of the buffer. + 3. We further extend these queries with rudimentary arithmetic support involving + expressions common to the allocation and the array access. + 4. For cases where this is insufficient, we introduce global value numbering + [[https://codeql.github.com/docs/codeql-language-guides/hash-consing-and-value-numbering][GVN]] in [[*Step 9 -- Global Value Numbering][Step 9 -- Global Value Numbering]], to detect values known to be equal + at runtime. + 5. When /those/ cases are insufficient, and we handle the case of identical + structure using [[*hashconsing][hashconsing]]. ** Step 1 :PROPERTIES: @@ -385,7 +406,7 @@ To address these, take the query from the previous exercise and constants that flow to a given expression. Another approach is global value numbering, used next. -** Step 9 -- GlobalValueNumbering +** Step 9 -- Global Value Numbering Range analyis won't bound =sz * x * y=, so switch to global value numbering. This is the case in the last test case, @@ -492,7 +513,7 @@ To address these, take the query from the previous exercise and #+INCLUDE: "example9.ql" src java -** TODO hashcons +** TODO hashconsing import semmle.code.cpp.valuenumbering.HashCons hashcons: every value gets a number based on structure. Fails on From 22126edf22f172a3266ece44657820366d7b92bb Mon Sep 17 00:00:00 2001 From: Michael Hohn Date: Tue, 16 May 2023 21:24:57 -0700 Subject: [PATCH 12/28] Introduce some predicates at step 4a --- session-tests/Example4a/example4a.expected | 15 ++ session-tests/Example4a/example4a.qlref | 1 + session-tests/Example4a/test.c | 85 +++++++++ session/example2.ql | 2 + session/example3.ql | 2 + session/example4.ql | 1 + session/example4a.ql | 46 +++++ session/session.md | 210 +++++++++++++++------ session/session.org | 21 +++ 9 files changed, 325 insertions(+), 58 deletions(-) create mode 100644 session-tests/Example4a/example4a.expected create mode 100644 session-tests/Example4a/example4a.qlref create mode 100644 session-tests/Example4a/test.c create mode 100644 session/example4a.ql diff --git a/session-tests/Example4a/example4a.expected b/session-tests/Example4a/example4a.expected new file mode 100644 index 0000000..88b6955 --- /dev/null +++ b/session-tests/Example4a/example4a.expected @@ -0,0 +1,15 @@ +| test.c:7:17:7:22 | call to malloc | test.c:8:5:8:10 | access to array | 0 | test.c:8:9:8:9 | 0 | 100 | test.c:7:24:7:26 | 100 | +| test.c:7:17:7:22 | call to malloc | test.c:9:5:9:11 | access to array | 99 | test.c:9:9:9:10 | 99 | 100 | test.c:7:24:7:26 | 100 | +| test.c:7:17:7:22 | call to malloc | test.c:10:5:10:12 | access to array | 100 | test.c:10:9:10:11 | 100 | 100 | test.c:7:24:7:26 | 100 | +| test.c:16:17:16:22 | call to malloc | test.c:17:5:17:10 | access to array | 0 | test.c:17:9:17:9 | 0 | 100 | test.c:15:26:15:28 | 100 | +| test.c:16:17:16:22 | call to malloc | test.c:18:5:18:11 | access to array | 99 | test.c:18:9:18:10 | 99 | 100 | test.c:15:26:15:28 | 100 | +| test.c:16:17:16:22 | call to malloc | test.c:20:5:20:12 | access to array | 100 | test.c:20:9:20:11 | 100 | 100 | test.c:15:26:15:28 | 100 | +| test.c:28:17:28:22 | call to malloc | test.c:35:5:35:10 | access to array | 0 | test.c:35:9:35:9 | 0 | 100 | test.c:26:39:26:41 | 100 | +| test.c:28:17:28:22 | call to malloc | test.c:35:5:35:10 | access to array | 0 | test.c:35:9:35:9 | 0 | 200 | test.c:26:45:26:47 | 200 | +| test.c:28:17:28:22 | call to malloc | test.c:36:5:36:11 | access to array | 99 | test.c:36:9:36:10 | 99 | 100 | test.c:26:39:26:41 | 100 | +| test.c:28:17:28:22 | call to malloc | test.c:36:5:36:11 | access to array | 99 | test.c:36:9:36:10 | 99 | 200 | test.c:26:45:26:47 | 200 | +| test.c:28:17:28:22 | call to malloc | test.c:38:5:38:12 | access to array | 100 | test.c:38:9:38:11 | 100 | 100 | test.c:26:39:26:41 | 100 | +| test.c:28:17:28:22 | call to malloc | test.c:38:5:38:12 | access to array | 100 | test.c:38:9:38:11 | 100 | 200 | test.c:26:45:26:47 | 200 | +| test.c:63:17:63:22 | call to malloc | test.c:65:5:65:10 | access to array | 0 | test.c:65:9:65:9 | 0 | 200 | test.c:55:22:55:24 | 200 | +| test.c:63:17:63:22 | call to malloc | test.c:66:5:66:12 | access to array | 100 | test.c:66:9:66:11 | 100 | 200 | test.c:55:22:55:24 | 200 | +| test.c:63:17:63:22 | call to malloc | test.c:67:5:67:12 | access to array | 200 | test.c:67:9:67:11 | 200 | 200 | test.c:55:22:55:24 | 200 | diff --git a/session-tests/Example4a/example4a.qlref b/session-tests/Example4a/example4a.qlref new file mode 100644 index 0000000..d588285 --- /dev/null +++ b/session-tests/Example4a/example4a.qlref @@ -0,0 +1 @@ +example4a.ql diff --git a/session-tests/Example4a/test.c b/session-tests/Example4a/test.c new file mode 100644 index 0000000..6eaa405 --- /dev/null +++ b/session-tests/Example4a/test.c @@ -0,0 +1,85 @@ +void *malloc(unsigned long);/* clang compatible */ + +unsigned long extern_get_size(void); + +void test_const(void) +{ + char *buf = malloc(100); + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[100]; // NON_COMPLIANT +} + +void test_const_var(void) +{ + unsigned long size = 100; + char *buf = malloc(size); + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[size - 1]; // COMPLIANT + buf[100]; // NON_COMPLIANT + buf[size]; // NON_COMPLIANT +} + +void test_const_branch(int mode, int random_condition) +{ + unsigned long size = (mode == 1 ? 100 : 200); + + char *buf = malloc(size); + + if (random_condition) + { + size = 300; + } + + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[size - 1]; // NON_COMPLIANT + buf[100]; // NON_COMPLIANT[DONT REPORT] + buf[size]; // NON_COMPLIANT + + if (size < 199) + { + buf[size]; // COMPLIANT + buf[size + 1]; // COMPLIANT + buf[size + 2]; // NON_COMPLIANT + } +} + +void test_const_branch2(int mode) +{ + unsigned long alloc_size = 0; + + if (mode == 1) + { + alloc_size = 200; + } + else + { + // unknown const size - don't report accesses + alloc_size = extern_get_size(); + } + + char *buf = malloc(alloc_size); + + buf[0]; // COMPLIANT + buf[100]; // COMPLIANT + buf[200]; // NON_COMPLIANT + buf[alloc_size - 1]; // COMPLIANT + buf[alloc_size]; // NON_COMPLIANT + + if (alloc_size < 199) + { + buf[alloc_size]; // COMPLIANT + buf[alloc_size + 1]; // COMPLIANT + buf[alloc_size + 2]; // NON_COMPLIANT + } +} + +void test_gvn_var(unsigned long x, unsigned long y, unsigned long sz) +{ + char *buf = malloc(sz * x * y); + buf[sz * x * y - 1]; // COMPLIANT + buf[sz * x * y]; // NON_COMPLIANT + buf[sz * x * y + 1]; // NON_COMPLIANT +} diff --git a/session/example2.ql b/session/example2.ql index 27c2eb4..ea93823 100644 --- a/session/example2.ql +++ b/session/example2.ql @@ -22,6 +22,8 @@ where // allocSizeExpr = buffer.(Call).getArgument(0) and bufferSize = allocSizeExpr.getValue().toInt() and + // + // Ensure alloc and buffer access are in the same function // char *buf = ... buf[0]; // ^^^ ---> ^^^ // or diff --git a/session/example3.ql b/session/example3.ql index f7dfc98..a93c6b6 100644 --- a/session/example3.ql +++ b/session/example3.ql @@ -21,6 +21,8 @@ where // allocSizeExpr = buffer.(Call).getArgument(0) and // bufferSize = allocSizeExpr.getValue().toInt() and + // + // Ensure alloc and buffer access are in the same function // char *buf = ... buf[0]; // ^^^ ---> ^^^ // or diff --git a/session/example4.ql b/session/example4.ql index 8cdff66..0ae4a0e 100644 --- a/session/example4.ql +++ b/session/example4.ql @@ -29,6 +29,7 @@ where bufferSizeExpr.getValue().toInt() = bufferSize and bse = bufferSizeExpr ) and + // Ensure alloc and buffer access are in the same function // char *buf = ... buf[0]; // ^^^ ---> ^^^ // or diff --git a/session/example4a.ql b/session/example4a.ql new file mode 100644 index 0000000..b90a317 --- /dev/null +++ b/session/example4a.ql @@ -0,0 +1,46 @@ +import cpp +import semmle.code.cpp.dataflow.DataFlow + +from AllocationExpr buffer, ArrayExpr access, int accessIdx, int bufferSize, Expr bufferSizeExpr +where + // malloc (100) + // ^^^^^^^^^^^^ AllocationExpr buffer + // + // buf[...] + // ^^^ ArrayExpr access + // + // buf[...] + // ^^^ int accessIdx + // + accessIdx = access.getArrayOffset().getValue().toInt() and + getAllocConstantExpr(bufferSizeExpr, bufferSize) and + // Ensure alloc and buffer access are in the same function + ensureSameFunction(buffer, access.getArrayBase()) and + // Ensure size defintion and use are in same function, even for non-constant expressions. + ensureSameFunction(bufferSizeExpr, buffer.getSizeExpr()) +// +select buffer, access, accessIdx, access.getArrayOffset(), bufferSize, bufferSizeExpr + +/** Ensure the two expressions are in the same function body. */ +predicate ensureSameFunction(Expr a, Expr b) { DataFlow::localExprFlow(a, b) } + +/** + * Gets an expression that flows to the allocation (which includes those already in the allocation) + * and has a constant value. + */ +predicate getAllocConstantExpr(Expr bufferSizeExpr, int bufferSize) { + exists(AllocationExpr buffer | + // + // Capture BOTH with datflow: + // 1. + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // + // 2. + // unsigned long size = 100; + // ... + // char *buf = malloc(size); + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + bufferSizeExpr.getValue().toInt() = bufferSize + ) +} diff --git a/session/session.md b/session/session.md index 16a5a5e..0611e8b 100644 --- a/session/session.md +++ b/session/session.md @@ -6,35 +6,37 @@ - [Session/Workshop notes](#sessionworkshop-notes) - [Step 1](#exercise-1) - [Hints](#hints) - - [Solution](#org8ca1443) - - [Step 2](#org6138b3d) + - [Solution](#orgdb027c4) + - [Step 2](#org103a4a0) - [Hints](#hints) - - [Solution](#org287ad06) - - [Results](#org4b8509f) + - [Solution](#org9a38362) + - [Results](#org0373f43) - [Step 3](#exercise-2) - - [Solution](#orga37db88) - - [Results](#org22d1a25) - - [Step 4](#org493babd) - - [Hint](#org57d9881) - - [Solution](#org9303851) - - [Results](#org9ba681e) - - [Step 5 – SimpleRangeAnalysis](#orgda84218) - - [Solution](#orgb5a7df0) - - [Results](#orgf04ac53) - - [Step 6](#orgd9ab97c) - - [Solution](#org79d9ce3) - - [Results](#org00d27a6) - - [Step 7](#org4bfd9c3) - - [Solution:](#orgf500bdf) - - [Results](#org07a41ff) - - [Step 8](#orgd642b5f) - - [Solution:](#org696e813) - - [Results](#org77abe31) - - [Interim notes](#org03ebd84) - - [Step 9 – GlobalValueNumbering](#org29bb594) - - [interim](#orgfc8f904) - - [interim](#org53cf2e1) - - [hashcons](#org7ccef88) + - [Solution](#org100e79f) + - [Results](#orgc91557b) + - [Step 4](#org5ed496c) + - [Hint](#org353b905) + - [Solution](#org5e7a90d) + - [Results](#orgc376727) + - [Step 4a – some clean-up using predicates](#org5fbc890) + - [Solution](#orgc0ef9cd) + - [Step 5 – SimpleRangeAnalysis](#org3250327) + - [Solution](#orga42f2d0) + - [Results](#org2dd5caf) + - [Step 6](#org7bfbf7f) + - [Solution](#orgbf7f580) + - [Results](#orgc770d19) + - [Step 7](#org27d428b) + - [Solution:](#org1d7080e) + - [Results](#orgcc04f97) + - [Step 8](#orgd04e3b9) + - [Solution:](#org6638628) + - [Results](#org42712eb) + - [Interim notes](#org4007937) + - [Step 9 – Global Value Numbering](#orgde8bf97) + - [interim](#org08c13b9) + - [interim](#org11a3f79) + - [hashconsing](#orgc7ce1fc) @@ -143,6 +145,18 @@ For this workshop, the larger segment of code is still simplified skeleton code, The queries are embedded in \`session.md\` but can also be found in the \`example\*.ql\` files. They can all be run as test cases in VS Code. +To reiterate: + +This workshop focuses on analyzing and relating two *static* values — array access indices and memory allocation sizes — in order to identify simple cases of out-of-bounds array accesses. We do not handle *dynamic* values but take advantage of special cases. + +To find these issues, + +1. We can implement an analysis that tracks the upper or lower bounds on an expression. +2. We then combine this with data-flow analysis to reduce false positives and identify cases where the index of the array results in an access beyond the allocated size of the buffer. +3. We further extend these queries with rudimentary arithmetic support involving expressions common to the allocation and the array access. +4. For cases where this is insufficient, we introduce global value numbering [GVN](https://codeql.github.com/docs/codeql-language-guides/hash-consing-and-value-numbering) in [Step 9 – Global Value Numbering](#orgde8bf97), to detect values known to be equal at runtime. +5. When *those* cases are insufficient, and we handle the case of identical structure using [hashconsing](#orgc7ce1fc). + @@ -153,8 +167,6 @@ In the first step we are going to 1. identify a dynamic allocation with `malloc` and 2. an access to that allocated buffer. The access is via an array expression; we are **not** going to cover pointer dereferencing. -We are going to accomplish these tasks via predicates. - The goal of this exercise is to then output the array access, array size, buffer, and buffer offset. The focus here is on @@ -175,7 +187,7 @@ in [db.c](file:///Users/hohn/local/codeql-workshop-runtime-values-c/session-db/D 1. `Expr::getValue()::toInt()` can be used to get the integer value of a constant expression. - + ### Solution @@ -207,7 +219,7 @@ select buffer, access, accessIdx, access.getArrayOffset(), bufferSize, allocSize This produces 12 results, with some cross-function pairs. - + ## Step 2 @@ -227,7 +239,7 @@ To address these, take the query from the previous exercise and 2. The the array base is the `buf` part of `buf[0]`. Use the `Expr.getArrayBase()` predicate. - + ### Solution @@ -256,6 +268,8 @@ where // allocSizeExpr = buffer.(Call).getArgument(0) and bufferSize = allocSizeExpr.getValue().toInt() and + // + // Ensure alloc and buffer access are in the same function // char *buf = ... buf[0]; // ^^^ ---> ^^^ // or @@ -267,7 +281,7 @@ select buffer, access, accessIdx, access.getArrayOffset(), bufferSize, allocSize ``` - + ### Results @@ -295,7 +309,7 @@ Here, the `malloc` argument is a variable with known value. We include this result by removing the size-retrieval from the prior query. - + ### Solution @@ -323,6 +337,8 @@ where // allocSizeExpr = buffer.(Call).getArgument(0) and // bufferSize = allocSizeExpr.getValue().toInt() and + // + // Ensure alloc and buffer access are in the same function // char *buf = ... buf[0]; // ^^^ ---> ^^^ // or @@ -334,14 +350,14 @@ select buffer, access, accessIdx, access.getArrayOffset() ``` - + ### Results Now, we get 12 results, including some from other test cases. - + ## Step 4 @@ -354,12 +370,12 @@ Note the results for the cases in `test_const_var` which involve a variable acce We have an expression `size` that flows into the `malloc()` call. - + ### Hint - + ### Solution @@ -395,6 +411,7 @@ where bufferSizeExpr.getValue().toInt() = bufferSize and bse = bufferSizeExpr ) and + // Ensure alloc and buffer access are in the same function // char *buf = ... buf[0]; // ^^^ ---> ^^^ // or @@ -406,7 +423,7 @@ select buffer, access, accessIdx, access.getArrayOffset(), bufferSize, bse ``` - + ### Results @@ -421,7 +438,84 @@ XX: 3. then use classes, if desired. `class BufferAccess extends ArrayExpr` is different from those below. - + + +## Step 4a – some clean-up using predicates + +Note that the dataflow automatically captures/includes the + + allocSizeExpr = buffer.(Call).getArgument(0) + +so that's now redundant with `bufferSizeExpr` and can be removed. + +```java + +allocSizeExpr = buffer.(Call).getArgument(0) and +// bufferSize = allocSizeExpr.getValue().toInt() and +// +// unsigned long size = 100; +// ... +// char *buf = malloc(size); +DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + +``` + + + + +### Solution + +```java +import cpp +import semmle.code.cpp.dataflow.DataFlow + +from AllocationExpr buffer, ArrayExpr access, int accessIdx, int bufferSize, Expr bufferSizeExpr +where + // malloc (100) + // ^^^^^^^^^^^^ AllocationExpr buffer + // + // buf[...] + // ^^^ ArrayExpr access + // + // buf[...] + // ^^^ int accessIdx + // + accessIdx = access.getArrayOffset().getValue().toInt() and + getAllocConstantExpr(bufferSizeExpr, bufferSize) and + // Ensure alloc and buffer access are in the same function + ensureSameFunction(buffer, access.getArrayBase()) and + // Ensure size defintion and use are in same function, even for non-constant expressions. + ensureSameFunction(bufferSizeExpr, buffer.getSizeExpr()) +// +select buffer, access, accessIdx, access.getArrayOffset(), bufferSize, bufferSizeExpr + +/** Ensure the two expressions are in the same function body. */ +predicate ensureSameFunction(Expr a, Expr b) { DataFlow::localExprFlow(a, b) } + +/** + * Gets an expression that flows to the allocation (which includes those already in the allocation) + * and has a constant value. + */ +predicate getAllocConstantExpr(Expr bufferSizeExpr, int bufferSize) { + exists(AllocationExpr buffer | + // + // Capture BOTH with datflow: + // 1. + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // + // 2. + // unsigned long size = 100; + // ... + // char *buf = malloc(size); + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + bufferSizeExpr.getValue().toInt() = bufferSize + ) +} +``` + + + ## Step 5 – SimpleRangeAnalysis @@ -445,7 +539,7 @@ This requires the import import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis - + ### Solution @@ -495,14 +589,14 @@ select buffer, bufferSizeExpr, access, upperBound(accessIdx) as accessMax, acces ``` - + ### Results Now, we get 48 results. - + ## Step 6 @@ -524,7 +618,7 @@ Hints: 4. These test cases all use type `char`. What would happen for `int` or `double`? - + ### Solution @@ -572,7 +666,7 @@ select buffer, bufferSizeExpr, access, upperBound(accessIdx) as accessMax, acces ``` - + ### Results @@ -583,7 +677,7 @@ select buffer, bufferSizeExpr, access, upperBound(accessIdx) as accessMax, acces | 1 | call to malloc | 200 | access to array | 0 | 0 | 200 | 200 | char | 1 | 1 | - + ## Step 7 @@ -592,7 +686,7 @@ select buffer, bufferSizeExpr, access, upperBound(accessIdx) as accessMax, acces 3. Compare buffer allocation size to the access index. - + ### Solution: @@ -644,7 +738,7 @@ select buffer, bufferSizeExpr, access, upperBound(accessIdx) as accessMax, alloc ``` - + ### Results @@ -654,7 +748,7 @@ select buffer, bufferSizeExpr, access, upperBound(accessIdx) as accessMax, alloc | 1 | call to malloc | 200 | access to array | 0 | 200 | 200 | 0 | | - + ## Step 8 @@ -672,7 +766,7 @@ select buffer, bufferSizeExpr, access, upperBound(accessIdx) as accessMax, alloc to get nicer reporting. - + ### Solution: @@ -732,7 +826,7 @@ select access, "Array access at or beyond size; have "+allocatedUnits + " units, ``` - + ### Results @@ -743,7 +837,7 @@ select access, "Array access at or beyond size; have "+allocatedUnits + " units, | Array access at or beyond size; have 200 units, access at 200 | db.c:67:5 | - + ## Interim notes @@ -756,9 +850,9 @@ int val = rand() ? rand() : 30; A similar case is present in the `test_const_branch` and `test_const_branch2` test-cases. In these cases, it is necessary to augment range analysis with data-flow and restrict the bounds to the upper or lower bound of computable constants that flow to a given expression. Another approach is global value numbering, used next. - + -## Step 9 – GlobalValueNumbering +## Step 9 – Global Value Numbering Range analyis won't bound `sz * x * y`, so switch to global value numbering. This is the case in the last test case, @@ -795,7 +889,7 @@ We can use global value numbering to identify common values as first step, but f we have to "evaluate" the expressions – or at least bound them. - + ### interim @@ -858,7 +952,7 @@ select access, ``` - + ### interim @@ -917,9 +1011,9 @@ select access, gvnAccess, gvnAlloc ``` - + -## TODO hashcons +## TODO hashconsing import semmle.code.cpp.valuenumbering.HashCons diff --git a/session/session.org b/session/session.org index 1bd834d..c78f7d9 100644 --- a/session/session.org +++ b/session/session.org @@ -278,6 +278,27 @@ To address these, take the query from the previous exercise and 3. then use classes, if desired. =class BufferAccess extends ArrayExpr= is different from those below. +** Step 4a -- some clean-up using predicates + + Note that the dataflow automatically captures/includes the + : allocSizeExpr = buffer.(Call).getArgument(0) + so that's now redundant with =bufferSizeExpr= and can be removed. + #+BEGIN_SRC java + + allocSizeExpr = buffer.(Call).getArgument(0) and + // bufferSize = allocSizeExpr.getValue().toInt() and + // + // unsigned long size = 100; + // ... + // char *buf = malloc(size); + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + + #+END_SRC + +*** Solution + + #+INCLUDE: "example4a.ql" src java + ** Step 5 -- SimpleRangeAnalysis Running the query from Step 2 against the database yields a significant number of missing or incorrect results. The reason is that From b44418539a3ef661872a7ee1e77c4d41818190c6 Mon Sep 17 00:00:00 2001 From: Michael Hohn Date: Wed, 17 May 2023 08:24:27 -0700 Subject: [PATCH 13/28] Further clarify the goals in "Session/Workshop notes" --- session/session.org | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/session/session.org b/session/session.org index c78f7d9..6963891 100644 --- a/session/session.org +++ b/session/session.org @@ -157,10 +157,12 @@ Standards repository]]. allocated size of the buffer. 3. We further extend these queries with rudimentary arithmetic support involving expressions common to the allocation and the array access. - 4. For cases where this is insufficient, we introduce global value numbering + 4. For cases where constant expressions are not available or are uncertain, we + first try [[*Step 5 -- SimpleRangeAnalysis][range analysis]] to expand the query's applicability. + 5. For cases where this is insufficient, we introduce global value numbering [[https://codeql.github.com/docs/codeql-language-guides/hash-consing-and-value-numbering][GVN]] in [[*Step 9 -- Global Value Numbering][Step 9 -- Global Value Numbering]], to detect values known to be equal at runtime. - 5. When /those/ cases are insufficient, and we handle the case of identical + 6. When /those/ cases are insufficient, we handle the case of identical structure using [[*hashconsing][hashconsing]]. ** Step 1 From 71153b22e6c5714994aa1972b531961ba5cd2ccf Mon Sep 17 00:00:00 2001 From: Michael Hohn Date: Wed, 17 May 2023 09:42:21 -0700 Subject: [PATCH 14/28] WIP: Update example5. List predicates to be introduced --- session-tests/Example5/example5.expected | 80 ++++++------- session/example5.ql | 57 +++++---- session/session.org | 140 ++++++++++++++++++----- 3 files changed, 178 insertions(+), 99 deletions(-) diff --git a/session-tests/Example5/example5.expected b/session-tests/Example5/example5.expected index 1b5c7e7..079512f 100644 --- a/session-tests/Example5/example5.expected +++ b/session-tests/Example5/example5.expected @@ -1,48 +1,32 @@ -| test.c:7:17:7:22 | call to malloc | test.c:7:24:7:26 | 100 | test.c:8:5:8:10 | access to array | 0.0 | test.c:8:9:8:9 | 0 | 100 | test.c:7:24:7:26 | 100 | -| test.c:7:17:7:22 | call to malloc | test.c:7:24:7:26 | 100 | test.c:9:5:9:11 | access to array | 99.0 | test.c:9:9:9:10 | 99 | 100 | test.c:7:24:7:26 | 100 | -| test.c:7:17:7:22 | call to malloc | test.c:7:24:7:26 | 100 | test.c:10:5:10:12 | access to array | 100.0 | test.c:10:9:10:11 | 100 | 100 | test.c:7:24:7:26 | 100 | -| test.c:16:17:16:22 | call to malloc | test.c:15:26:15:28 | 100 | test.c:17:5:17:10 | access to array | 0.0 | test.c:17:9:17:9 | 0 | 100 | test.c:15:26:15:28 | 100 | -| test.c:16:17:16:22 | call to malloc | test.c:15:26:15:28 | 100 | test.c:18:5:18:11 | access to array | 99.0 | test.c:18:9:18:10 | 99 | 100 | test.c:15:26:15:28 | 100 | -| test.c:16:17:16:22 | call to malloc | test.c:15:26:15:28 | 100 | test.c:19:5:19:17 | access to array | 99.0 | test.c:19:9:19:16 | ... - ... | 100 | test.c:15:26:15:28 | 100 | -| test.c:16:17:16:22 | call to malloc | test.c:15:26:15:28 | 100 | test.c:20:5:20:12 | access to array | 100.0 | test.c:20:9:20:11 | 100 | 100 | test.c:15:26:15:28 | 100 | -| test.c:16:17:16:22 | call to malloc | test.c:15:26:15:28 | 100 | test.c:21:5:21:13 | access to array | 100.0 | test.c:21:9:21:12 | size | 100 | test.c:15:26:15:28 | 100 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:35:5:35:10 | access to array | 0.0 | test.c:35:9:35:9 | 0 | 100 | test.c:26:39:26:41 | 100 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:35:5:35:10 | access to array | 0.0 | test.c:35:9:35:9 | 0 | 200 | test.c:26:45:26:47 | 200 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:36:5:36:11 | access to array | 99.0 | test.c:36:9:36:10 | 99 | 100 | test.c:26:39:26:41 | 100 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:36:5:36:11 | access to array | 99.0 | test.c:36:9:36:10 | 99 | 200 | test.c:26:45:26:47 | 200 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:37:5:37:17 | access to array | 299.0 | test.c:37:9:37:16 | ... - ... | 100 | test.c:26:39:26:41 | 100 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:37:5:37:17 | access to array | 299.0 | test.c:37:9:37:16 | ... - ... | 200 | test.c:26:45:26:47 | 200 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:38:5:38:12 | access to array | 100.0 | test.c:38:9:38:11 | 100 | 100 | test.c:26:39:26:41 | 100 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:38:5:38:12 | access to array | 100.0 | test.c:38:9:38:11 | 100 | 200 | test.c:26:45:26:47 | 200 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:39:5:39:13 | access to array | 300.0 | test.c:39:9:39:12 | size | 100 | test.c:26:39:26:41 | 100 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:39:5:39:13 | access to array | 300.0 | test.c:39:9:39:12 | size | 200 | test.c:26:45:26:47 | 200 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:43:9:43:17 | access to array | 198.0 | test.c:43:13:43:16 | size | 100 | test.c:26:39:26:41 | 100 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:43:9:43:17 | access to array | 198.0 | test.c:43:13:43:16 | size | 200 | test.c:26:45:26:47 | 200 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:44:9:44:21 | access to array | 199.0 | test.c:44:13:44:20 | ... + ... | 100 | test.c:26:39:26:41 | 100 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:44:9:44:21 | access to array | 199.0 | test.c:44:13:44:20 | ... + ... | 200 | test.c:26:45:26:47 | 200 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:45:9:45:21 | access to array | 200.0 | test.c:45:13:45:20 | ... + ... | 100 | test.c:26:39:26:41 | 100 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:45:9:45:21 | access to array | 200.0 | test.c:45:13:45:20 | ... + ... | 200 | test.c:26:45:26:47 | 200 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:35:5:35:10 | access to array | 0.0 | test.c:35:9:35:9 | 0 | 100 | test.c:26:39:26:41 | 100 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:35:5:35:10 | access to array | 0.0 | test.c:35:9:35:9 | 0 | 200 | test.c:26:45:26:47 | 200 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:36:5:36:11 | access to array | 99.0 | test.c:36:9:36:10 | 99 | 100 | test.c:26:39:26:41 | 100 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:36:5:36:11 | access to array | 99.0 | test.c:36:9:36:10 | 99 | 200 | test.c:26:45:26:47 | 200 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:37:5:37:17 | access to array | 299.0 | test.c:37:9:37:16 | ... - ... | 100 | test.c:26:39:26:41 | 100 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:37:5:37:17 | access to array | 299.0 | test.c:37:9:37:16 | ... - ... | 200 | test.c:26:45:26:47 | 200 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:38:5:38:12 | access to array | 100.0 | test.c:38:9:38:11 | 100 | 100 | test.c:26:39:26:41 | 100 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:38:5:38:12 | access to array | 100.0 | test.c:38:9:38:11 | 100 | 200 | test.c:26:45:26:47 | 200 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:39:5:39:13 | access to array | 300.0 | test.c:39:9:39:12 | size | 100 | test.c:26:39:26:41 | 100 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:39:5:39:13 | access to array | 300.0 | test.c:39:9:39:12 | size | 200 | test.c:26:45:26:47 | 200 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:43:9:43:17 | access to array | 198.0 | test.c:43:13:43:16 | size | 100 | test.c:26:39:26:41 | 100 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:43:9:43:17 | access to array | 198.0 | test.c:43:13:43:16 | size | 200 | test.c:26:45:26:47 | 200 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:44:9:44:21 | access to array | 199.0 | test.c:44:13:44:20 | ... + ... | 100 | test.c:26:39:26:41 | 100 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:44:9:44:21 | access to array | 199.0 | test.c:44:13:44:20 | ... + ... | 200 | test.c:26:45:26:47 | 200 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:45:9:45:21 | access to array | 200.0 | test.c:45:13:45:20 | ... + ... | 100 | test.c:26:39:26:41 | 100 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:45:9:45:21 | access to array | 200.0 | test.c:45:13:45:20 | ... + ... | 200 | test.c:26:45:26:47 | 200 | -| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:65:5:65:10 | access to array | 0.0 | test.c:65:9:65:9 | 0 | 200 | test.c:55:22:55:24 | 200 | -| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:66:5:66:12 | access to array | 100.0 | test.c:66:9:66:11 | 100 | 200 | test.c:55:22:55:24 | 200 | -| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:67:5:67:12 | access to array | 200.0 | test.c:67:9:67:11 | 200 | 200 | test.c:55:22:55:24 | 200 | -| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:68:5:68:23 | access to array | 1.8446744073709552E19 | test.c:68:9:68:22 | ... - ... | 200 | test.c:55:22:55:24 | 200 | -| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:69:5:69:19 | access to array | 1.8446744073709552E19 | test.c:69:9:69:18 | alloc_size | 200 | test.c:55:22:55:24 | 200 | -| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:73:9:73:23 | access to array | 198.0 | test.c:73:13:73:22 | alloc_size | 200 | test.c:55:22:55:24 | 200 | -| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:74:9:74:27 | access to array | 199.0 | test.c:74:13:74:26 | ... + ... | 200 | test.c:55:22:55:24 | 200 | -| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:75:9:75:27 | access to array | 200.0 | test.c:75:13:75:26 | ... + ... | 200 | test.c:55:22:55:24 | 200 | +| test.c:7:24:7:26 | 100 | test.c:7:17:7:22 | call to malloc | test.c:8:5:8:10 | access to array | test.c:8:9:8:9 | 0 | 0.0 | +| test.c:7:24:7:26 | 100 | test.c:7:17:7:22 | call to malloc | test.c:9:5:9:11 | access to array | test.c:9:9:9:10 | 99 | 99.0 | +| test.c:7:24:7:26 | 100 | test.c:7:17:7:22 | call to malloc | test.c:10:5:10:12 | access to array | test.c:10:9:10:11 | 100 | 100.0 | +| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:17:5:17:10 | access to array | test.c:17:9:17:9 | 0 | 0.0 | +| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:18:5:18:11 | access to array | test.c:18:9:18:10 | 99 | 99.0 | +| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:19:5:19:17 | access to array | test.c:19:9:19:16 | ... - ... | 99.0 | +| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:20:5:20:12 | access to array | test.c:20:9:20:11 | 100 | 100.0 | +| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:21:5:21:13 | access to array | test.c:21:9:21:12 | size | 100.0 | +| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:35:5:35:10 | access to array | test.c:35:9:35:9 | 0 | 0.0 | +| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:36:5:36:11 | access to array | test.c:36:9:36:10 | 99 | 99.0 | +| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:37:5:37:17 | access to array | test.c:37:9:37:16 | ... - ... | 299.0 | +| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:38:5:38:12 | access to array | test.c:38:9:38:11 | 100 | 100.0 | +| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:39:5:39:13 | access to array | test.c:39:9:39:12 | size | 300.0 | +| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:43:9:43:17 | access to array | test.c:43:13:43:16 | size | 198.0 | +| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:44:9:44:21 | access to array | test.c:44:13:44:20 | ... + ... | 199.0 | +| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:45:9:45:21 | access to array | test.c:45:13:45:20 | ... + ... | 200.0 | +| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:35:5:35:10 | access to array | test.c:35:9:35:9 | 0 | 0.0 | +| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:36:5:36:11 | access to array | test.c:36:9:36:10 | 99 | 99.0 | +| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:37:5:37:17 | access to array | test.c:37:9:37:16 | ... - ... | 299.0 | +| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:38:5:38:12 | access to array | test.c:38:9:38:11 | 100 | 100.0 | +| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:39:5:39:13 | access to array | test.c:39:9:39:12 | size | 300.0 | +| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:43:9:43:17 | access to array | test.c:43:13:43:16 | size | 198.0 | +| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:44:9:44:21 | access to array | test.c:44:13:44:20 | ... + ... | 199.0 | +| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:45:9:45:21 | access to array | test.c:45:13:45:20 | ... + ... | 200.0 | +| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:65:5:65:10 | access to array | test.c:65:9:65:9 | 0 | 0.0 | +| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:66:5:66:12 | access to array | test.c:66:9:66:11 | 100 | 100.0 | +| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:67:5:67:12 | access to array | test.c:67:9:67:11 | 200 | 200.0 | +| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:68:5:68:23 | access to array | test.c:68:9:68:22 | ... - ... | 1.8446744073709552E19 | +| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:69:5:69:19 | access to array | test.c:69:9:69:18 | alloc_size | 1.8446744073709552E19 | +| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:73:9:73:23 | access to array | test.c:73:13:73:22 | alloc_size | 198.0 | +| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:74:9:74:27 | access to array | test.c:74:13:74:26 | ... + ... | 199.0 | +| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:75:9:75:27 | access to array | test.c:75:13:75:26 | ... + ... | 200.0 | diff --git a/session/example5.ql b/session/example5.ql index fccc1ae..3cc4e1b 100644 --- a/session/example5.ql +++ b/session/example5.ql @@ -2,10 +2,8 @@ import cpp import semmle.code.cpp.dataflow.DataFlow import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis -// Step 5 -from - AllocationExpr buffer, ArrayExpr access, Expr accessIdx, Expr allocSizeExpr, int bufferSize, - int allocsize, Expr bufferSizeExpr + +from AllocationExpr buffer, ArrayExpr access, Expr accessIdx, int bufferSize, Expr bufferSizeExpr where // malloc (100) // ^^^^^^^^^^^^ AllocationExpr buffer @@ -21,22 +19,35 @@ where // malloc (100) // ^^^ allocSizeExpr / bufferSize // - // Not really: - // allocSizeExpr = buffer.(Call).getArgument(0) and - // - DataFlow::localExprFlow(allocSizeExpr, buffer.(Call).getArgument(0)) and - allocsize = allocSizeExpr.getValue().toInt() and - // - // unsigned long size = 100; - // ... - // char *buf = malloc(size); - DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and - bufferSizeExpr.getValue().toInt() = bufferSize and - // char *buf = ... buf[0]; - // ^^^ ---> ^^^ - // or - // malloc(100); buf[0] - // ^^^ --------> ^^^ - // - DataFlow::localExprFlow(buffer, access.getArrayBase()) -select buffer, bufferSizeExpr, access, upperBound(accessIdx) as accessMax, accessIdx, allocsize, allocSizeExpr + getAllocConstantExpr(bufferSizeExpr, bufferSize) and + // Ensure alloc and buffer access are in the same function + ensureSameFunction(buffer, access.getArrayBase()) and + // Ensure size defintion and use are in same function, even for non-constant expressions. + ensureSameFunction(bufferSizeExpr, buffer.getSizeExpr()) +// +select bufferSizeExpr, buffer, access, accessIdx, upperBound(accessIdx) as accessMax + +/** Ensure the two expressions are in the same function body. */ +predicate ensureSameFunction(Expr a, Expr b) { DataFlow::localExprFlow(a, b) } + +/** + * Gets an expression that flows to the allocation (which includes those already in the allocation) + * and has a constant value. + */ +predicate getAllocConstantExpr(Expr bufferSizeExpr, int bufferSize) { + exists(AllocationExpr buffer | + // + // Capture BOTH with datflow: + // 1. + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // + // 2. + // unsigned long size = 100; + // ... + // char *buf = malloc(size); + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + bufferSizeExpr.getValue().toInt() = bufferSize + ) +} + diff --git a/session/session.org b/session/session.org index 6963891..8b3fa9f 100644 --- a/session/session.org +++ b/session/session.org @@ -267,19 +267,6 @@ To address these, take the query from the previous exercise and *** Results Now, we get 15 results, limited to statically determined values. - XX: Implement predicates - =getSourceConstantExpr=, =getFixedSize=, and =getFixedArrayOffset= - Use local data-flow analysis to complete the =getSourceConstantExpr= - predicate. The =getFixedSize= and =getFixedArrayOffset= predicates can - be completed using =getSourceConstantExpr=. - - XX: - 1. start with query. - =elementSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize()= - 2. convert to predicate. - 3. then use classes, if desired. =class BufferAccess extends ArrayExpr= - is different from those below. - ** Step 4a -- some clean-up using predicates Note that the dataflow automatically captures/includes the @@ -297,6 +284,11 @@ To address these, take the query from the previous exercise and #+END_SRC + Also, simplify the =from...where...select=: + 1. Remove unnecessary =exists= clauses. + 2. Use =DataFlow::localExprFlow= for the buffer and allocation sizes, with + =getValue().toInt()= as one possibility (one predicate). + *** Solution #+INCLUDE: "example4a.ql" src java @@ -321,24 +313,24 @@ To address these, take the query from the previous exercise and range analysis is reliably identifying integer overflow and validating integer overflow checks. - First, simplify the =from...where...select=: - 1. Remove unnecessary =exists= clauses. - 2. Use =DataFlow::localExprFlow= for the buffer and allocation sizes, not - =getValue().toInt()= - - Then, add the use of the =SimpleRangeAnalysis= library. Specifically, the + Now, add the use of the =SimpleRangeAnalysis= library. Specifically, the relevant library predicates are =upperBound= and =lowerBound=, to be used with - the buffer access argument. Experiment and decide which to use for this - exercise (=upperBound=, =lowerBound=, or both). + the buffer access argument. - This requires the import - : import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis + Notes: + - This requires the import + : import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis + - We are not limiting the array access to integers any longer. Thus, we just + use + : accessIdx = access.getArrayOffset() + - To see the results in the order used in the C code, use + : select bufferSizeExpr, buffer, access, accessIdx, upperBound(accessIdx) as accessMax *** Solution #+INCLUDE: "example5.ql" src java -*** Results - Now, we get 48 results. +*** First 5 results + #+INCLUDE: "../session-tests/Example5/example5.expected" :lines "-6"’ ** Step 6 To finally determine (some) out-of-bounds accesses, we have to convert @@ -450,11 +442,10 @@ To address these, take the query from the previous exercise and are not comparable (=<, >, <== etc.), and the /actual/ value is not known. - XX: global value numbering finds expressions with the same known value, + Global value numbering finds expressions with the same known value, independent of structure. - So, we look for and use /relative/ values between allocation and use. To do - this, use GVN. + So, we look for and use /relative/ values between allocation and use. The relevant CodeQL constructs are #+BEGIN_SRC java @@ -472,6 +463,99 @@ To address these, take the query from the previous exercise and #+end_example we have to "evaluate" the expressions -- or at least bound them. +*** TODO incoporate + #+BEGIN_SRC java + /** + ,* Gets the smallest of the upper bound of `e` or the largest source value (i.e. "stated value") that flows to `e`. + ,* Because range-analysis can over-widen bounds, take the minimum of range analysis and data-flow sources. + ,* + ,* If there is no source value that flows to `e`, this predicate does not hold. + ,* + ,* This predicate, if `e` is the `sz` arg to `malloc`, would return `20` for the following: + ,* ``` + ,* size_t sz = condition ? 10 : 20; + ,* malloc(sz); + ,* ``` + ,*/ + bindingset[e] + int getMaxStatedValue(Expr e) { + result = upperBound(e).minimum(max(getSourceConstantExpr(e).getValue().toInt())) + } + #+END_SRC + +*** DONE incorporate + Done by =ensureSameFunction= instead. + #+BEGIN_SRC java + predicate allocatedBufferArrayAccess(ArrayExpr access, FunctionCall alloc) { + alloc.getTarget().hasName("malloc") and + DataFlow::localExprFlow(alloc, access.getArrayBase()) + } + #+END_SRC + +*** TODO incoporate + #+BEGIN_SRC java + int getFixedArrayOffset(ArrayExpr access) { + exists(Expr base, int offset | + offset = getExprOffsetValue(access.getArrayOffset(), base) and + result = getMaxStatedValue(base) + offset + ) + } + #+END_SRC + +*** TODO incoporate + #+BEGIN_SRC java + predicate isOffsetOutOfBoundsConstant( + ArrayExpr access, FunctionCall source, int allocSize, int accessOffset + ) { + allocatedBufferArrayAccess(access, source) and + allocSize = getMaxStatedValue(source.getArgument(0)) and + accessOffset = getFixedArrayOffset(access) and + accessOffset >= allocSize + } + #+END_SRC + +*** TODO incoporate + #+BEGIN_SRC java + predicate isOffsetOutOfBoundsGVN(ArrayExpr access, FunctionCall source) { + allocatedBufferArrayAccess(access, source) and + not isOffsetOutOfBoundsConstant(access, source, _, _) and + exists(Expr accessOffsetBase, int accessOffsetBaseValue | + accessOffsetBaseValue = getExprOffsetValue(access.getArrayOffset(), accessOffsetBase) and + globalValueNumber(source.getArgument(0)) = globalValueNumber(accessOffsetBase) and + not accessOffsetBaseValue < 0 + ) + } + #+END_SRC + +*** TODO incoporate + #+BEGIN_SRC java + /** + ,* @id cpp/array-access-out-of-bounds + ,* @description Access of an array with an index that is greater or equal to the element num. + ,* @kind problem + ,* @problem.severity error + ,*/ + + import cpp + import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis + import semmle.code.cpp.dataflow.DataFlow + import semmle.code.cpp.valuenumbering.GlobalValueNumbering + import RuntimeValues + + from FunctionCall source, ArrayExpr access, string message + where + exists(int allocSize, int accessOffset | + isOffsetOutOfBoundsConstant(access, source, allocSize, accessOffset) and + message = + "Array access out of bounds: " + access.toString() + " with offset " + accessOffset.toString() + + " on $@ with size " + allocSize.toString() + ) + or + isOffsetOutOfBoundsGVN(access, source) and + message = "Array access with index that is greater or equal to the size of the $@." + select access, message, source, "allocation" + #+END_SRC + *** interim #+BEGIN_SRC java /** From 2d30b8549aa463e519cf362d3baca7c0615c76c8 Mon Sep 17 00:00:00 2001 From: Michael Hohn Date: Wed, 17 May 2023 12:33:18 -0700 Subject: [PATCH 15/28] Update example 6 --- session-tests/Example6/example6.expected | 80 ++++++++++-------------- session/example6.ql | 60 +++++++++++------- session/session.org | 12 +--- 3 files changed, 72 insertions(+), 80 deletions(-) diff --git a/session-tests/Example6/example6.expected b/session-tests/Example6/example6.expected index f674929..e6f2ac1 100644 --- a/session-tests/Example6/example6.expected +++ b/session-tests/Example6/example6.expected @@ -1,48 +1,32 @@ -| test.c:7:17:7:22 | call to malloc | test.c:7:24:7:26 | 100 | test.c:8:5:8:10 | access to array | 0.0 | test.c:8:9:8:9 | 0 | 100 | test.c:7:24:7:26 | 100 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:7:17:7:22 | call to malloc | test.c:7:24:7:26 | 100 | test.c:9:5:9:11 | access to array | 99.0 | test.c:9:9:9:10 | 99 | 100 | test.c:7:24:7:26 | 100 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:7:17:7:22 | call to malloc | test.c:7:24:7:26 | 100 | test.c:10:5:10:12 | access to array | 100.0 | test.c:10:9:10:11 | 100 | 100 | test.c:7:24:7:26 | 100 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:16:17:16:22 | call to malloc | test.c:15:26:15:28 | 100 | test.c:17:5:17:10 | access to array | 0.0 | test.c:17:9:17:9 | 0 | 100 | test.c:15:26:15:28 | 100 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:16:17:16:22 | call to malloc | test.c:15:26:15:28 | 100 | test.c:18:5:18:11 | access to array | 99.0 | test.c:18:9:18:10 | 99 | 100 | test.c:15:26:15:28 | 100 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:16:17:16:22 | call to malloc | test.c:15:26:15:28 | 100 | test.c:19:5:19:17 | access to array | 99.0 | test.c:19:9:19:16 | ... - ... | 100 | test.c:15:26:15:28 | 100 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:16:17:16:22 | call to malloc | test.c:15:26:15:28 | 100 | test.c:20:5:20:12 | access to array | 100.0 | test.c:20:9:20:11 | 100 | 100 | test.c:15:26:15:28 | 100 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:16:17:16:22 | call to malloc | test.c:15:26:15:28 | 100 | test.c:21:5:21:13 | access to array | 100.0 | test.c:21:9:21:12 | size | 100 | test.c:15:26:15:28 | 100 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:35:5:35:10 | access to array | 0.0 | test.c:35:9:35:9 | 0 | 100 | test.c:26:39:26:41 | 100 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:35:5:35:10 | access to array | 0.0 | test.c:35:9:35:9 | 0 | 200 | test.c:26:45:26:47 | 200 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:36:5:36:11 | access to array | 99.0 | test.c:36:9:36:10 | 99 | 100 | test.c:26:39:26:41 | 100 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:36:5:36:11 | access to array | 99.0 | test.c:36:9:36:10 | 99 | 200 | test.c:26:45:26:47 | 200 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:37:5:37:17 | access to array | 299.0 | test.c:37:9:37:16 | ... - ... | 100 | test.c:26:39:26:41 | 100 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:37:5:37:17 | access to array | 299.0 | test.c:37:9:37:16 | ... - ... | 200 | test.c:26:45:26:47 | 200 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:38:5:38:12 | access to array | 100.0 | test.c:38:9:38:11 | 100 | 100 | test.c:26:39:26:41 | 100 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:38:5:38:12 | access to array | 100.0 | test.c:38:9:38:11 | 100 | 200 | test.c:26:45:26:47 | 200 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:39:5:39:13 | access to array | 300.0 | test.c:39:9:39:12 | size | 100 | test.c:26:39:26:41 | 100 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:39:5:39:13 | access to array | 300.0 | test.c:39:9:39:12 | size | 200 | test.c:26:45:26:47 | 200 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:43:9:43:17 | access to array | 198.0 | test.c:43:13:43:16 | size | 100 | test.c:26:39:26:41 | 100 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:43:9:43:17 | access to array | 198.0 | test.c:43:13:43:16 | size | 200 | test.c:26:45:26:47 | 200 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:44:9:44:21 | access to array | 199.0 | test.c:44:13:44:20 | ... + ... | 100 | test.c:26:39:26:41 | 100 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:44:9:44:21 | access to array | 199.0 | test.c:44:13:44:20 | ... + ... | 200 | test.c:26:45:26:47 | 200 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:45:9:45:21 | access to array | 200.0 | test.c:45:13:45:20 | ... + ... | 100 | test.c:26:39:26:41 | 100 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:45:9:45:21 | access to array | 200.0 | test.c:45:13:45:20 | ... + ... | 200 | test.c:26:45:26:47 | 200 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:35:5:35:10 | access to array | 0.0 | test.c:35:9:35:9 | 0 | 100 | test.c:26:39:26:41 | 100 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:35:5:35:10 | access to array | 0.0 | test.c:35:9:35:9 | 0 | 200 | test.c:26:45:26:47 | 200 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:36:5:36:11 | access to array | 99.0 | test.c:36:9:36:10 | 99 | 100 | test.c:26:39:26:41 | 100 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:36:5:36:11 | access to array | 99.0 | test.c:36:9:36:10 | 99 | 200 | test.c:26:45:26:47 | 200 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:37:5:37:17 | access to array | 299.0 | test.c:37:9:37:16 | ... - ... | 100 | test.c:26:39:26:41 | 100 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:37:5:37:17 | access to array | 299.0 | test.c:37:9:37:16 | ... - ... | 200 | test.c:26:45:26:47 | 200 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:38:5:38:12 | access to array | 100.0 | test.c:38:9:38:11 | 100 | 100 | test.c:26:39:26:41 | 100 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:38:5:38:12 | access to array | 100.0 | test.c:38:9:38:11 | 100 | 200 | test.c:26:45:26:47 | 200 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:39:5:39:13 | access to array | 300.0 | test.c:39:9:39:12 | size | 100 | test.c:26:39:26:41 | 100 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:39:5:39:13 | access to array | 300.0 | test.c:39:9:39:12 | size | 200 | test.c:26:45:26:47 | 200 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:43:9:43:17 | access to array | 198.0 | test.c:43:13:43:16 | size | 100 | test.c:26:39:26:41 | 100 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:43:9:43:17 | access to array | 198.0 | test.c:43:13:43:16 | size | 200 | test.c:26:45:26:47 | 200 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:44:9:44:21 | access to array | 199.0 | test.c:44:13:44:20 | ... + ... | 100 | test.c:26:39:26:41 | 100 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:44:9:44:21 | access to array | 199.0 | test.c:44:13:44:20 | ... + ... | 200 | test.c:26:45:26:47 | 200 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:45:9:45:21 | access to array | 200.0 | test.c:45:13:45:20 | ... + ... | 100 | test.c:26:39:26:41 | 100 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:45:9:45:21 | access to array | 200.0 | test.c:45:13:45:20 | ... + ... | 200 | test.c:26:45:26:47 | 200 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:65:5:65:10 | access to array | 0.0 | test.c:65:9:65:9 | 0 | 200 | test.c:55:22:55:24 | 200 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:66:5:66:12 | access to array | 100.0 | test.c:66:9:66:11 | 100 | 200 | test.c:55:22:55:24 | 200 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:67:5:67:12 | access to array | 200.0 | test.c:67:9:67:11 | 200 | 200 | test.c:55:22:55:24 | 200 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:68:5:68:23 | access to array | 1.8446744073709552E19 | test.c:68:9:68:22 | ... - ... | 200 | test.c:55:22:55:24 | 200 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:69:5:69:19 | access to array | 1.8446744073709552E19 | test.c:69:9:69:18 | alloc_size | 200 | test.c:55:22:55:24 | 200 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:73:9:73:23 | access to array | 198.0 | test.c:73:13:73:22 | alloc_size | 200 | test.c:55:22:55:24 | 200 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:74:9:74:27 | access to array | 199.0 | test.c:74:13:74:26 | ... + ... | 200 | test.c:55:22:55:24 | 200 | file://:0:0:0:0 | char | 1 | 1 | -| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:75:9:75:27 | access to array | 200.0 | test.c:75:13:75:26 | ... + ... | 200 | test.c:55:22:55:24 | 200 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:7:24:7:26 | 100 | test.c:7:17:7:22 | call to malloc | test.c:8:5:8:10 | access to array | test.c:8:9:8:9 | 0 | 0.0 | 100 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:7:24:7:26 | 100 | test.c:7:17:7:22 | call to malloc | test.c:9:5:9:11 | access to array | test.c:9:9:9:10 | 99 | 99.0 | 100 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:7:24:7:26 | 100 | test.c:7:17:7:22 | call to malloc | test.c:10:5:10:12 | access to array | test.c:10:9:10:11 | 100 | 100.0 | 100 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:17:5:17:10 | access to array | test.c:17:9:17:9 | 0 | 0.0 | 100 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:18:5:18:11 | access to array | test.c:18:9:18:10 | 99 | 99.0 | 100 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:19:5:19:17 | access to array | test.c:19:9:19:16 | ... - ... | 99.0 | 100 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:20:5:20:12 | access to array | test.c:20:9:20:11 | 100 | 100.0 | 100 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:21:5:21:13 | access to array | test.c:21:9:21:12 | size | 100.0 | 100 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:35:5:35:10 | access to array | test.c:35:9:35:9 | 0 | 0.0 | 100 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:36:5:36:11 | access to array | test.c:36:9:36:10 | 99 | 99.0 | 100 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:37:5:37:17 | access to array | test.c:37:9:37:16 | ... - ... | 299.0 | 100 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:38:5:38:12 | access to array | test.c:38:9:38:11 | 100 | 100.0 | 100 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:39:5:39:13 | access to array | test.c:39:9:39:12 | size | 300.0 | 100 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:43:9:43:17 | access to array | test.c:43:13:43:16 | size | 198.0 | 100 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:44:9:44:21 | access to array | test.c:44:13:44:20 | ... + ... | 199.0 | 100 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:45:9:45:21 | access to array | test.c:45:13:45:20 | ... + ... | 200.0 | 100 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:35:5:35:10 | access to array | test.c:35:9:35:9 | 0 | 0.0 | 200 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:36:5:36:11 | access to array | test.c:36:9:36:10 | 99 | 99.0 | 200 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:37:5:37:17 | access to array | test.c:37:9:37:16 | ... - ... | 299.0 | 200 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:38:5:38:12 | access to array | test.c:38:9:38:11 | 100 | 100.0 | 200 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:39:5:39:13 | access to array | test.c:39:9:39:12 | size | 300.0 | 200 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:43:9:43:17 | access to array | test.c:43:13:43:16 | size | 198.0 | 200 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:44:9:44:21 | access to array | test.c:44:13:44:20 | ... + ... | 199.0 | 200 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:45:9:45:21 | access to array | test.c:45:13:45:20 | ... + ... | 200.0 | 200 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:65:5:65:10 | access to array | test.c:65:9:65:9 | 0 | 0.0 | 200 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:66:5:66:12 | access to array | test.c:66:9:66:11 | 100 | 100.0 | 200 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:67:5:67:12 | access to array | test.c:67:9:67:11 | 200 | 200.0 | 200 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:68:5:68:23 | access to array | test.c:68:9:68:22 | ... - ... | 1.8446744073709552E19 | 200 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:69:5:69:19 | access to array | test.c:69:9:69:18 | alloc_size | 1.8446744073709552E19 | 200 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:73:9:73:23 | access to array | test.c:73:13:73:22 | alloc_size | 198.0 | 200 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:74:9:74:27 | access to array | test.c:74:13:74:26 | ... + ... | 199.0 | 200 | file://:0:0:0:0 | char | 1 | 1 | +| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:75:9:75:27 | access to array | test.c:75:13:75:26 | ... + ... | 200.0 | 200 | file://:0:0:0:0 | char | 1 | 1 | diff --git a/session/example6.ql b/session/example6.ql index 4dc1e5e..0a69c76 100644 --- a/session/example6.ql +++ b/session/example6.ql @@ -2,39 +2,53 @@ import cpp import semmle.code.cpp.dataflow.DataFlow import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis -// Step 6 -from - AllocationExpr buffer, ArrayExpr access, Expr accessIdx, Expr allocSizeExpr, int bufferSize, - int allocsize, Expr bufferSizeExpr +from AllocationExpr buffer, ArrayExpr access, Expr accessIdx, int bufferSize, Expr bufferSizeExpr where // malloc (100) // ^^^^^^^^^^^^ AllocationExpr buffer // // buf[...] // ^^^ ArrayExpr access + // // buf[...] // ^^^ int accessIdx + // accessIdx = access.getArrayOffset() and // // malloc (100) // ^^^ allocSizeExpr / bufferSize // - // Not really: - // allocSizeExpr = buffer.(Call).getArgument(0) and - // - DataFlow::localExprFlow(allocSizeExpr, buffer.(Call).getArgument(0)) and - allocsize = allocSizeExpr.getValue().toInt() and - // - // unsigned long size = 100; - // ... - // char *buf = malloc(size); - DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and - bufferSizeExpr.getValue().toInt() = bufferSize and - // char *buf = ... buf[0]; - // ^^^ ---> ^^^ - // or - // malloc(100); buf[0] - // ^^^ --------> ^^^ - // - DataFlow::localExprFlow(buffer, access.getArrayBase()) -select buffer, bufferSizeExpr, access, upperBound(accessIdx) as accessMax, accessIdx, allocsize, allocSizeExpr, access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType() as arrayBaseType, access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() as arrayTypeSize, 1 as allocBaseSize + getAllocConstantExpr(bufferSizeExpr, bufferSize) and + // Ensure alloc and buffer access are in the same function + ensureSameFunction(buffer, access.getArrayBase()) and + // Ensure size defintion and use are in same function, even for non-constant expressions. + ensureSameFunction(bufferSizeExpr, buffer.getSizeExpr()) +// +select bufferSizeExpr, buffer, access, accessIdx, upperBound(accessIdx) as accessMax, bufferSize, + access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType() as arrayBaseType, + access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() as arrayTypeSize, + 1 as allocBaseSize + +/** Ensure the two expressions are in the same function body. */ +predicate ensureSameFunction(Expr a, Expr b) { DataFlow::localExprFlow(a, b) } + +/** + * Gets an expression that flows to the allocation (which includes those already in the allocation) + * and has a constant value. + */ +predicate getAllocConstantExpr(Expr bufferSizeExpr, int bufferSize) { + exists(AllocationExpr buffer | + // + // Capture BOTH with datflow: + // 1. + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // + // 2. + // unsigned long size = 100; + // ... + // char *buf = malloc(size); + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + bufferSizeExpr.getValue().toInt() = bufferSize + ) +} diff --git a/session/session.org b/session/session.org index 8b3fa9f..4b5befb 100644 --- a/session/session.org +++ b/session/session.org @@ -354,20 +354,14 @@ To address these, take the query from the previous exercise and to the allocated memory./ So =size = 1= - 3. Note that - =allocSizeExpr.getUnspecifiedType() as allocBaseType= - is wrong here. - - 4. These test cases all use type =char=. What would happen for =int= or + 3. These test cases all use type =char=. What would happen for =int= or =double=? *** Solution #+INCLUDE: "example6.ql" src java -*** Results - 48 results in the table - - | 1 | call to malloc | 200 | access to array | 0 | 0 | 200 | 200 | char | 1 | 1 | +*** First 5 results + #+INCLUDE: "../session-tests/Example6/example6.expected" :lines "-6"’ ** Step 7 1. Clean up the query. From 8cea58a43eaedf155ae2da3ab582bfbdafbe0bac Mon Sep 17 00:00:00 2001 From: Michael Hohn Date: Wed, 17 May 2023 13:49:50 -0700 Subject: [PATCH 16/28] Update example7. Prepare to introduce more predicates --- session-tests/Example7/example7.expected | 80 +++++++----------- session/example7.ql | 57 +++++++------ session/session.org | 101 ++++++++++++++--------- 3 files changed, 126 insertions(+), 112 deletions(-) diff --git a/session-tests/Example7/example7.expected b/session-tests/Example7/example7.expected index 269d633..b2a1ce7 100644 --- a/session-tests/Example7/example7.expected +++ b/session-tests/Example7/example7.expected @@ -1,48 +1,32 @@ -| test.c:7:17:7:22 | call to malloc | test.c:7:24:7:26 | 100 | test.c:8:5:8:10 | access to array | 0.0 | test.c:7:24:7:26 | 100 | 100 | 0.0 | -| test.c:7:17:7:22 | call to malloc | test.c:7:24:7:26 | 100 | test.c:9:5:9:11 | access to array | 99.0 | test.c:7:24:7:26 | 100 | 100 | 99.0 | -| test.c:7:17:7:22 | call to malloc | test.c:7:24:7:26 | 100 | test.c:10:5:10:12 | access to array | 100.0 | test.c:7:24:7:26 | 100 | 100 | 100.0 | -| test.c:16:17:16:22 | call to malloc | test.c:15:26:15:28 | 100 | test.c:17:5:17:10 | access to array | 0.0 | test.c:15:26:15:28 | 100 | 100 | 0.0 | -| test.c:16:17:16:22 | call to malloc | test.c:15:26:15:28 | 100 | test.c:18:5:18:11 | access to array | 99.0 | test.c:15:26:15:28 | 100 | 100 | 99.0 | -| test.c:16:17:16:22 | call to malloc | test.c:15:26:15:28 | 100 | test.c:19:5:19:17 | access to array | 99.0 | test.c:15:26:15:28 | 100 | 100 | 99.0 | -| test.c:16:17:16:22 | call to malloc | test.c:15:26:15:28 | 100 | test.c:20:5:20:12 | access to array | 100.0 | test.c:15:26:15:28 | 100 | 100 | 100.0 | -| test.c:16:17:16:22 | call to malloc | test.c:15:26:15:28 | 100 | test.c:21:5:21:13 | access to array | 100.0 | test.c:15:26:15:28 | 100 | 100 | 100.0 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:35:5:35:10 | access to array | 0.0 | test.c:26:39:26:41 | 100 | 100 | 0.0 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:35:5:35:10 | access to array | 0.0 | test.c:26:45:26:47 | 200 | 200 | 0.0 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:36:5:36:11 | access to array | 99.0 | test.c:26:39:26:41 | 100 | 100 | 99.0 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:36:5:36:11 | access to array | 99.0 | test.c:26:45:26:47 | 200 | 200 | 99.0 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:37:5:37:17 | access to array | 299.0 | test.c:26:39:26:41 | 100 | 100 | 299.0 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:37:5:37:17 | access to array | 299.0 | test.c:26:45:26:47 | 200 | 200 | 299.0 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:38:5:38:12 | access to array | 100.0 | test.c:26:39:26:41 | 100 | 100 | 100.0 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:38:5:38:12 | access to array | 100.0 | test.c:26:45:26:47 | 200 | 200 | 100.0 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:39:5:39:13 | access to array | 300.0 | test.c:26:39:26:41 | 100 | 100 | 300.0 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:39:5:39:13 | access to array | 300.0 | test.c:26:45:26:47 | 200 | 200 | 300.0 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:43:9:43:17 | access to array | 198.0 | test.c:26:39:26:41 | 100 | 100 | 198.0 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:43:9:43:17 | access to array | 198.0 | test.c:26:45:26:47 | 200 | 200 | 198.0 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:44:9:44:21 | access to array | 199.0 | test.c:26:39:26:41 | 100 | 100 | 199.0 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:44:9:44:21 | access to array | 199.0 | test.c:26:45:26:47 | 200 | 200 | 199.0 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:45:9:45:21 | access to array | 200.0 | test.c:26:39:26:41 | 100 | 100 | 200.0 | -| test.c:28:17:28:22 | call to malloc | test.c:26:39:26:41 | 100 | test.c:45:9:45:21 | access to array | 200.0 | test.c:26:45:26:47 | 200 | 200 | 200.0 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:35:5:35:10 | access to array | 0.0 | test.c:26:39:26:41 | 100 | 100 | 0.0 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:35:5:35:10 | access to array | 0.0 | test.c:26:45:26:47 | 200 | 200 | 0.0 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:36:5:36:11 | access to array | 99.0 | test.c:26:39:26:41 | 100 | 100 | 99.0 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:36:5:36:11 | access to array | 99.0 | test.c:26:45:26:47 | 200 | 200 | 99.0 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:37:5:37:17 | access to array | 299.0 | test.c:26:39:26:41 | 100 | 100 | 299.0 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:37:5:37:17 | access to array | 299.0 | test.c:26:45:26:47 | 200 | 200 | 299.0 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:38:5:38:12 | access to array | 100.0 | test.c:26:39:26:41 | 100 | 100 | 100.0 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:38:5:38:12 | access to array | 100.0 | test.c:26:45:26:47 | 200 | 200 | 100.0 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:39:5:39:13 | access to array | 300.0 | test.c:26:39:26:41 | 100 | 100 | 300.0 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:39:5:39:13 | access to array | 300.0 | test.c:26:45:26:47 | 200 | 200 | 300.0 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:43:9:43:17 | access to array | 198.0 | test.c:26:39:26:41 | 100 | 100 | 198.0 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:43:9:43:17 | access to array | 198.0 | test.c:26:45:26:47 | 200 | 200 | 198.0 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:44:9:44:21 | access to array | 199.0 | test.c:26:39:26:41 | 100 | 100 | 199.0 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:44:9:44:21 | access to array | 199.0 | test.c:26:45:26:47 | 200 | 200 | 199.0 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:45:9:45:21 | access to array | 200.0 | test.c:26:39:26:41 | 100 | 100 | 200.0 | -| test.c:28:17:28:22 | call to malloc | test.c:26:45:26:47 | 200 | test.c:45:9:45:21 | access to array | 200.0 | test.c:26:45:26:47 | 200 | 200 | 200.0 | -| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:65:5:65:10 | access to array | 0.0 | test.c:55:22:55:24 | 200 | 200 | 0.0 | -| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:66:5:66:12 | access to array | 100.0 | test.c:55:22:55:24 | 200 | 200 | 100.0 | -| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:67:5:67:12 | access to array | 200.0 | test.c:55:22:55:24 | 200 | 200 | 200.0 | -| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:68:5:68:23 | access to array | 1.8446744073709552E19 | test.c:55:22:55:24 | 200 | 200 | 1.8446744073709552E19 | -| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:69:5:69:19 | access to array | 1.8446744073709552E19 | test.c:55:22:55:24 | 200 | 200 | 1.8446744073709552E19 | -| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:73:9:73:23 | access to array | 198.0 | test.c:55:22:55:24 | 200 | 200 | 198.0 | -| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:74:9:74:27 | access to array | 199.0 | test.c:55:22:55:24 | 200 | 200 | 199.0 | -| test.c:63:17:63:22 | call to malloc | test.c:55:22:55:24 | 200 | test.c:75:9:75:27 | access to array | 200.0 | test.c:55:22:55:24 | 200 | 200 | 200.0 | +| test.c:7:24:7:26 | 100 | test.c:7:17:7:22 | call to malloc | test.c:8:5:8:10 | access to array | test.c:8:9:8:9 | 0 | 0.0 | 100 | file://:0:0:0:0 | char | 100 | 0.0 | +| test.c:7:24:7:26 | 100 | test.c:7:17:7:22 | call to malloc | test.c:9:5:9:11 | access to array | test.c:9:9:9:10 | 99 | 99.0 | 100 | file://:0:0:0:0 | char | 100 | 99.0 | +| test.c:7:24:7:26 | 100 | test.c:7:17:7:22 | call to malloc | test.c:10:5:10:12 | access to array | test.c:10:9:10:11 | 100 | 100.0 | 100 | file://:0:0:0:0 | char | 100 | 100.0 | +| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:17:5:17:10 | access to array | test.c:17:9:17:9 | 0 | 0.0 | 100 | file://:0:0:0:0 | char | 100 | 0.0 | +| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:18:5:18:11 | access to array | test.c:18:9:18:10 | 99 | 99.0 | 100 | file://:0:0:0:0 | char | 100 | 99.0 | +| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:19:5:19:17 | access to array | test.c:19:9:19:16 | ... - ... | 99.0 | 100 | file://:0:0:0:0 | char | 100 | 99.0 | +| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:20:5:20:12 | access to array | test.c:20:9:20:11 | 100 | 100.0 | 100 | file://:0:0:0:0 | char | 100 | 100.0 | +| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:21:5:21:13 | access to array | test.c:21:9:21:12 | size | 100.0 | 100 | file://:0:0:0:0 | char | 100 | 100.0 | +| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:35:5:35:10 | access to array | test.c:35:9:35:9 | 0 | 0.0 | 100 | file://:0:0:0:0 | char | 100 | 0.0 | +| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:36:5:36:11 | access to array | test.c:36:9:36:10 | 99 | 99.0 | 100 | file://:0:0:0:0 | char | 100 | 99.0 | +| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:37:5:37:17 | access to array | test.c:37:9:37:16 | ... - ... | 299.0 | 100 | file://:0:0:0:0 | char | 100 | 299.0 | +| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:38:5:38:12 | access to array | test.c:38:9:38:11 | 100 | 100.0 | 100 | file://:0:0:0:0 | char | 100 | 100.0 | +| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:39:5:39:13 | access to array | test.c:39:9:39:12 | size | 300.0 | 100 | file://:0:0:0:0 | char | 100 | 300.0 | +| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:43:9:43:17 | access to array | test.c:43:13:43:16 | size | 198.0 | 100 | file://:0:0:0:0 | char | 100 | 198.0 | +| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:44:9:44:21 | access to array | test.c:44:13:44:20 | ... + ... | 199.0 | 100 | file://:0:0:0:0 | char | 100 | 199.0 | +| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:45:9:45:21 | access to array | test.c:45:13:45:20 | ... + ... | 200.0 | 100 | file://:0:0:0:0 | char | 100 | 200.0 | +| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:35:5:35:10 | access to array | test.c:35:9:35:9 | 0 | 0.0 | 200 | file://:0:0:0:0 | char | 200 | 0.0 | +| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:36:5:36:11 | access to array | test.c:36:9:36:10 | 99 | 99.0 | 200 | file://:0:0:0:0 | char | 200 | 99.0 | +| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:37:5:37:17 | access to array | test.c:37:9:37:16 | ... - ... | 299.0 | 200 | file://:0:0:0:0 | char | 200 | 299.0 | +| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:38:5:38:12 | access to array | test.c:38:9:38:11 | 100 | 100.0 | 200 | file://:0:0:0:0 | char | 200 | 100.0 | +| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:39:5:39:13 | access to array | test.c:39:9:39:12 | size | 300.0 | 200 | file://:0:0:0:0 | char | 200 | 300.0 | +| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:43:9:43:17 | access to array | test.c:43:13:43:16 | size | 198.0 | 200 | file://:0:0:0:0 | char | 200 | 198.0 | +| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:44:9:44:21 | access to array | test.c:44:13:44:20 | ... + ... | 199.0 | 200 | file://:0:0:0:0 | char | 200 | 199.0 | +| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:45:9:45:21 | access to array | test.c:45:13:45:20 | ... + ... | 200.0 | 200 | file://:0:0:0:0 | char | 200 | 200.0 | +| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:65:5:65:10 | access to array | test.c:65:9:65:9 | 0 | 0.0 | 200 | file://:0:0:0:0 | char | 200 | 0.0 | +| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:66:5:66:12 | access to array | test.c:66:9:66:11 | 100 | 100.0 | 200 | file://:0:0:0:0 | char | 200 | 100.0 | +| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:67:5:67:12 | access to array | test.c:67:9:67:11 | 200 | 200.0 | 200 | file://:0:0:0:0 | char | 200 | 200.0 | +| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:68:5:68:23 | access to array | test.c:68:9:68:22 | ... - ... | 1.8446744073709552E19 | 200 | file://:0:0:0:0 | char | 200 | 1.8446744073709552E19 | +| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:69:5:69:19 | access to array | test.c:69:9:69:18 | alloc_size | 1.8446744073709552E19 | 200 | file://:0:0:0:0 | char | 200 | 1.8446744073709552E19 | +| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:73:9:73:23 | access to array | test.c:73:13:73:22 | alloc_size | 198.0 | 200 | file://:0:0:0:0 | char | 200 | 198.0 | +| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:74:9:74:27 | access to array | test.c:74:13:74:26 | ... + ... | 199.0 | 200 | file://:0:0:0:0 | char | 200 | 199.0 | +| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:75:9:75:27 | access to array | test.c:75:13:75:26 | ... + ... | 200.0 | 200 | file://:0:0:0:0 | char | 200 | 200.0 | diff --git a/session/example7.ql b/session/example7.ql index 6ebb29d..0e2372e 100644 --- a/session/example7.ql +++ b/session/example7.ql @@ -2,14 +2,12 @@ import cpp import semmle.code.cpp.dataflow.DataFlow import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis -// Step 7 from - AllocationExpr buffer, ArrayExpr access, Expr accessIdx, Expr allocSizeExpr, int bufferSize, - int allocsize, Expr bufferSizeExpr, int arrayTypeSize, int allocBaseSize + AllocationExpr buffer, ArrayExpr access, Expr accessIdx, int bufferSize, Expr bufferSizeExpr, + int arrayTypeSize, int allocBaseSize where // malloc (100) // ^^^^^^^^^^^^ AllocationExpr buffer - // // buf[...] // ^^^ ArrayExpr access // buf[...] @@ -19,26 +17,35 @@ where // malloc (100) // ^^^ allocSizeExpr / bufferSize // - // Not really: - // allocSizeExpr = buffer.(Call).getArgument(0) and - // - DataFlow::localExprFlow(allocSizeExpr, buffer.(Call).getArgument(0)) and - allocsize = allocSizeExpr.getValue().toInt() and - // - // unsigned long size = 100; - // ... - // char *buf = malloc(size); - DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and - bufferSizeExpr.getValue().toInt() = bufferSize and - // char *buf = ... buf[0]; - // ^^^ ---> ^^^ - // or - // malloc(100); buf[0] - // ^^^ --------> ^^^ + getAllocConstantExpr(bufferSizeExpr, bufferSize) and + // Ensure alloc and buffer access are in the same function + ensureSameFunction(buffer, access.getArrayBase()) and + // Ensure size defintion and use are in same function, even for non-constant expressions. + ensureSameFunction(bufferSizeExpr, buffer.getSizeExpr()) and // - arrayTypeSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() - and + arrayTypeSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() and 1 = allocBaseSize - and - DataFlow::localExprFlow(buffer, access.getArrayBase()) -select buffer, bufferSizeExpr, access, upperBound(accessIdx) as accessMax, allocSizeExpr, allocBaseSize * allocsize as allocatedUnits, arrayTypeSize * accessMax as maxAccessedIndex +// +select bufferSizeExpr, buffer, access, accessIdx, upperBound(accessIdx) as accessMax, bufferSize, + access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType() as arrayBaseType, + allocBaseSize * bufferSize as allocatedUnits, arrayTypeSize * accessMax as maxAccessedIndex + +/** Ensure the two expressions are in the same function body. */ +predicate ensureSameFunction(Expr a, Expr b) { DataFlow::localExprFlow(a, b) } + +/** + * Gets an expression that flows to the allocation (which includes those already in the allocation) + * and has a constant value. + */ +predicate getAllocConstantExpr(Expr bufferSizeExpr, int bufferSize) { + exists(AllocationExpr buffer | + // Capture BOTH with datflow: + // 1. + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // 2. + // unsigned long size = 100; ... ; char *buf = malloc(size); + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + bufferSizeExpr.getValue().toInt() = bufferSize + ) +} diff --git a/session/session.org b/session/session.org index 4b5befb..935735c 100644 --- a/session/session.org +++ b/session/session.org @@ -365,19 +365,56 @@ To address these, take the query from the previous exercise and ** Step 7 1. Clean up the query. - 2. Add expressions for =allocatedUnits= (from the malloc) and a + 2. Compare buffer allocation size to the access index. + 3. Add expressions for =allocatedUnits= (from the malloc) and a =maxAccessedIndex= (from array accesses) - 3. Compare buffer allocation size to the access index. + 1. Calculate the =accessOffset= / =maxAccessedIndex= (from array accesses) + 2. Calculate the =allocSize= / =allocatedUnits= (from the malloc) + 3. Compare them *** Solution: #+INCLUDE: "example7.ql" src java -*** Results - 48 results in the much cleaner table +*** First 5 results + #+INCLUDE: "../session-tests/Example7/example7.expected" :lines "-6"’ - | no. | buffer | bufferSizeExpr | access | accessMax | allocSizeExpr | allocatedUnits | maxAccessedIndex | | - | 1 | call to malloc | 200 | access to array | 0 | 200 | 200 | 0 | | +** Step 7a + Introduce more general predicates. + 1. Move these into a single predicate, =isOffsetOutOfBoundsConstant= + +*** TODO incoporate + #+BEGIN_SRC java + /** + ,* Gets the smallest of the upper bound of `e` or the largest source value (i.e. "stated value") that flows to `e`. + ,* Because range-analysis can over-widen bounds, take the minimum of range analysis and data-flow sources. + ,* + ,* If there is no source value that flows to `e`, this predicate does not hold. + ,* + ,* This predicate, if `e` is the `sz` arg to `malloc`, would return `20` for the following: + ,* ``` + ,* size_t sz = condition ? 10 : 20; + ,* malloc(sz); + ,* ``` + ,*/ + bindingset[e] + int getMaxStatedValue(Expr e) { + result = upperBound(e).minimum(max(getSourceConstantExpr(e).getValue().toInt())) + } + #+END_SRC +*** TODO incoporate + #+BEGIN_SRC java + predicate isOffsetOutOfBoundsConstant( + ArrayExpr access, FunctionCall source, int allocSize, int accessOffset + ) { + ensureSameFunction(access, source) and + // allocatedBufferArrayAccess(access, source) and + allocSize = getMaxStatedValue(source.getArgument(0)) and + accessOffset = getFixedArrayOffset(access) and + accessOffset >= allocSize + } + #+END_SRC + ** Step 8 1. Clean up the query. 2. Compare buffer allocation size to the access index. @@ -457,26 +494,6 @@ To address these, take the query from the previous exercise and #+end_example we have to "evaluate" the expressions -- or at least bound them. -*** TODO incoporate - #+BEGIN_SRC java - /** - ,* Gets the smallest of the upper bound of `e` or the largest source value (i.e. "stated value") that flows to `e`. - ,* Because range-analysis can over-widen bounds, take the minimum of range analysis and data-flow sources. - ,* - ,* If there is no source value that flows to `e`, this predicate does not hold. - ,* - ,* This predicate, if `e` is the `sz` arg to `malloc`, would return `20` for the following: - ,* ``` - ,* size_t sz = condition ? 10 : 20; - ,* malloc(sz); - ,* ``` - ,*/ - bindingset[e] - int getMaxStatedValue(Expr e) { - result = upperBound(e).minimum(max(getSourceConstantExpr(e).getValue().toInt())) - } - #+END_SRC - *** DONE incorporate Done by =ensureSameFunction= instead. #+BEGIN_SRC java @@ -486,6 +503,24 @@ To address these, take the query from the previous exercise and } #+END_SRC +*** TODO incorporate + #+BEGIN_SRC java + bindingset[expr] + int getExprOffsetValue(Expr expr, Expr base) { + result = expr.(AddExpr).getRightOperand().getValue().toInt() and + base = expr.(AddExpr).getLeftOperand() + or + result = -expr.(SubExpr).getRightOperand().getValue().toInt() and + base = expr.(SubExpr).getLeftOperand() + or + // currently only AddExpr and SubExpr are supported: else, fall-back to 0 + not expr instanceof AddExpr and + not expr instanceof SubExpr and + base = expr and + result = 0 + } + #+END_SRC + *** TODO incoporate #+BEGIN_SRC java int getFixedArrayOffset(ArrayExpr access) { @@ -496,22 +531,10 @@ To address these, take the query from the previous exercise and } #+END_SRC -*** TODO incoporate - #+BEGIN_SRC java - predicate isOffsetOutOfBoundsConstant( - ArrayExpr access, FunctionCall source, int allocSize, int accessOffset - ) { - allocatedBufferArrayAccess(access, source) and - allocSize = getMaxStatedValue(source.getArgument(0)) and - accessOffset = getFixedArrayOffset(access) and - accessOffset >= allocSize - } - #+END_SRC - *** TODO incoporate #+BEGIN_SRC java predicate isOffsetOutOfBoundsGVN(ArrayExpr access, FunctionCall source) { - allocatedBufferArrayAccess(access, source) and + ensureSameFunction(access, source) and not isOffsetOutOfBoundsConstant(access, source, _, _) and exists(Expr accessOffsetBase, int accessOffsetBaseValue | accessOffsetBaseValue = getExprOffsetValue(access.getArrayOffset(), accessOffsetBase) and From e790a3cbc09843af14354a32a79d1059c422953f Mon Sep 17 00:00:00 2001 From: Michael Hohn Date: Wed, 17 May 2023 16:33:08 -0700 Subject: [PATCH 17/28] Added interim step 7a --- session-tests/Example7a/example7a.expected | 32 ++ session-tests/Example7a/example7a.qlref | 1 + session-tests/Example7a/test.c | 85 +++ session/example7a.ql | 48 ++ session/session.md | 606 +++++++++++++++------ session/session.org | 23 +- 6 files changed, 625 insertions(+), 170 deletions(-) create mode 100644 session-tests/Example7a/example7a.expected create mode 100644 session-tests/Example7a/example7a.qlref create mode 100644 session-tests/Example7a/test.c create mode 100644 session/example7a.ql diff --git a/session-tests/Example7a/example7a.expected b/session-tests/Example7a/example7a.expected new file mode 100644 index 0000000..1211e12 --- /dev/null +++ b/session-tests/Example7a/example7a.expected @@ -0,0 +1,32 @@ +| test.c:7:24:7:26 | 100 | test.c:7:17:7:22 | call to malloc | test.c:8:5:8:10 | access to array | test.c:8:9:8:9 | 0 | 0.0 | 100 | file://:0:0:0:0 | char | 1 | 1 | 100 | 0.0 | +| test.c:7:24:7:26 | 100 | test.c:7:17:7:22 | call to malloc | test.c:9:5:9:11 | access to array | test.c:9:9:9:10 | 99 | 99.0 | 100 | file://:0:0:0:0 | char | 1 | 1 | 100 | 99.0 | +| test.c:7:24:7:26 | 100 | test.c:7:17:7:22 | call to malloc | test.c:10:5:10:12 | access to array | test.c:10:9:10:11 | 100 | 100.0 | 100 | file://:0:0:0:0 | char | 1 | 1 | 100 | 100.0 | +| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:17:5:17:10 | access to array | test.c:17:9:17:9 | 0 | 0.0 | 100 | file://:0:0:0:0 | char | 1 | 1 | 100 | 0.0 | +| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:18:5:18:11 | access to array | test.c:18:9:18:10 | 99 | 99.0 | 100 | file://:0:0:0:0 | char | 1 | 1 | 100 | 99.0 | +| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:19:5:19:17 | access to array | test.c:19:9:19:16 | ... - ... | 99.0 | 100 | file://:0:0:0:0 | char | 1 | 1 | 100 | 99.0 | +| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:20:5:20:12 | access to array | test.c:20:9:20:11 | 100 | 100.0 | 100 | file://:0:0:0:0 | char | 1 | 1 | 100 | 100.0 | +| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:21:5:21:13 | access to array | test.c:21:9:21:12 | size | 100.0 | 100 | file://:0:0:0:0 | char | 1 | 1 | 100 | 100.0 | +| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:35:5:35:10 | access to array | test.c:35:9:35:9 | 0 | 0.0 | 100 | file://:0:0:0:0 | char | 1 | 1 | 100 | 0.0 | +| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:36:5:36:11 | access to array | test.c:36:9:36:10 | 99 | 99.0 | 100 | file://:0:0:0:0 | char | 1 | 1 | 100 | 99.0 | +| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:37:5:37:17 | access to array | test.c:37:9:37:16 | ... - ... | 299.0 | 100 | file://:0:0:0:0 | char | 1 | 1 | 100 | 299.0 | +| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:38:5:38:12 | access to array | test.c:38:9:38:11 | 100 | 100.0 | 100 | file://:0:0:0:0 | char | 1 | 1 | 100 | 100.0 | +| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:39:5:39:13 | access to array | test.c:39:9:39:12 | size | 300.0 | 100 | file://:0:0:0:0 | char | 1 | 1 | 100 | 300.0 | +| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:43:9:43:17 | access to array | test.c:43:13:43:16 | size | 198.0 | 100 | file://:0:0:0:0 | char | 1 | 1 | 100 | 198.0 | +| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:44:9:44:21 | access to array | test.c:44:13:44:20 | ... + ... | 199.0 | 100 | file://:0:0:0:0 | char | 1 | 1 | 100 | 199.0 | +| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:45:9:45:21 | access to array | test.c:45:13:45:20 | ... + ... | 200.0 | 100 | file://:0:0:0:0 | char | 1 | 1 | 100 | 200.0 | +| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:35:5:35:10 | access to array | test.c:35:9:35:9 | 0 | 0.0 | 200 | file://:0:0:0:0 | char | 1 | 1 | 200 | 0.0 | +| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:36:5:36:11 | access to array | test.c:36:9:36:10 | 99 | 99.0 | 200 | file://:0:0:0:0 | char | 1 | 1 | 200 | 99.0 | +| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:37:5:37:17 | access to array | test.c:37:9:37:16 | ... - ... | 299.0 | 200 | file://:0:0:0:0 | char | 1 | 1 | 200 | 299.0 | +| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:38:5:38:12 | access to array | test.c:38:9:38:11 | 100 | 100.0 | 200 | file://:0:0:0:0 | char | 1 | 1 | 200 | 100.0 | +| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:39:5:39:13 | access to array | test.c:39:9:39:12 | size | 300.0 | 200 | file://:0:0:0:0 | char | 1 | 1 | 200 | 300.0 | +| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:43:9:43:17 | access to array | test.c:43:13:43:16 | size | 198.0 | 200 | file://:0:0:0:0 | char | 1 | 1 | 200 | 198.0 | +| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:44:9:44:21 | access to array | test.c:44:13:44:20 | ... + ... | 199.0 | 200 | file://:0:0:0:0 | char | 1 | 1 | 200 | 199.0 | +| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:45:9:45:21 | access to array | test.c:45:13:45:20 | ... + ... | 200.0 | 200 | file://:0:0:0:0 | char | 1 | 1 | 200 | 200.0 | +| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:65:5:65:10 | access to array | test.c:65:9:65:9 | 0 | 0.0 | 200 | file://:0:0:0:0 | char | 1 | 1 | 200 | 0.0 | +| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:66:5:66:12 | access to array | test.c:66:9:66:11 | 100 | 100.0 | 200 | file://:0:0:0:0 | char | 1 | 1 | 200 | 100.0 | +| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:67:5:67:12 | access to array | test.c:67:9:67:11 | 200 | 200.0 | 200 | file://:0:0:0:0 | char | 1 | 1 | 200 | 200.0 | +| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:68:5:68:23 | access to array | test.c:68:9:68:22 | ... - ... | 1.8446744073709552E19 | 200 | file://:0:0:0:0 | char | 1 | 1 | 200 | 1.8446744073709552E19 | +| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:69:5:69:19 | access to array | test.c:69:9:69:18 | alloc_size | 1.8446744073709552E19 | 200 | file://:0:0:0:0 | char | 1 | 1 | 200 | 1.8446744073709552E19 | +| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:73:9:73:23 | access to array | test.c:73:13:73:22 | alloc_size | 198.0 | 200 | file://:0:0:0:0 | char | 1 | 1 | 200 | 198.0 | +| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:74:9:74:27 | access to array | test.c:74:13:74:26 | ... + ... | 199.0 | 200 | file://:0:0:0:0 | char | 1 | 1 | 200 | 199.0 | +| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:75:9:75:27 | access to array | test.c:75:13:75:26 | ... + ... | 200.0 | 200 | file://:0:0:0:0 | char | 1 | 1 | 200 | 200.0 | diff --git a/session-tests/Example7a/example7a.qlref b/session-tests/Example7a/example7a.qlref new file mode 100644 index 0000000..92b49c1 --- /dev/null +++ b/session-tests/Example7a/example7a.qlref @@ -0,0 +1 @@ +example7a.ql diff --git a/session-tests/Example7a/test.c b/session-tests/Example7a/test.c new file mode 100644 index 0000000..6eaa405 --- /dev/null +++ b/session-tests/Example7a/test.c @@ -0,0 +1,85 @@ +void *malloc(unsigned long);/* clang compatible */ + +unsigned long extern_get_size(void); + +void test_const(void) +{ + char *buf = malloc(100); + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[100]; // NON_COMPLIANT +} + +void test_const_var(void) +{ + unsigned long size = 100; + char *buf = malloc(size); + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[size - 1]; // COMPLIANT + buf[100]; // NON_COMPLIANT + buf[size]; // NON_COMPLIANT +} + +void test_const_branch(int mode, int random_condition) +{ + unsigned long size = (mode == 1 ? 100 : 200); + + char *buf = malloc(size); + + if (random_condition) + { + size = 300; + } + + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[size - 1]; // NON_COMPLIANT + buf[100]; // NON_COMPLIANT[DONT REPORT] + buf[size]; // NON_COMPLIANT + + if (size < 199) + { + buf[size]; // COMPLIANT + buf[size + 1]; // COMPLIANT + buf[size + 2]; // NON_COMPLIANT + } +} + +void test_const_branch2(int mode) +{ + unsigned long alloc_size = 0; + + if (mode == 1) + { + alloc_size = 200; + } + else + { + // unknown const size - don't report accesses + alloc_size = extern_get_size(); + } + + char *buf = malloc(alloc_size); + + buf[0]; // COMPLIANT + buf[100]; // COMPLIANT + buf[200]; // NON_COMPLIANT + buf[alloc_size - 1]; // COMPLIANT + buf[alloc_size]; // NON_COMPLIANT + + if (alloc_size < 199) + { + buf[alloc_size]; // COMPLIANT + buf[alloc_size + 1]; // COMPLIANT + buf[alloc_size + 2]; // NON_COMPLIANT + } +} + +void test_gvn_var(unsigned long x, unsigned long y, unsigned long sz) +{ + char *buf = malloc(sz * x * y); + buf[sz * x * y - 1]; // COMPLIANT + buf[sz * x * y]; // NON_COMPLIANT + buf[sz * x * y + 1]; // NON_COMPLIANT +} diff --git a/session/example7a.ql b/session/example7a.ql new file mode 100644 index 0000000..8210a23 --- /dev/null +++ b/session/example7a.ql @@ -0,0 +1,48 @@ +import cpp +import semmle.code.cpp.dataflow.DataFlow +import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis + +from + AllocationExpr buffer, ArrayExpr access, Expr accessIdx, int bufferSize, Expr bufferSizeExpr, + int arrayTypeSize, int allocBaseSize +where + // malloc (100) + // ^^^^^^^^^^^^ AllocationExpr buffer + // buf[...] + // ^^^^^^^^ ArrayExpr access + // ^^^ int accessIdx + accessIdx = access.getArrayOffset() and + getAllocConstantExpr(bufferSizeExpr, bufferSize) and + // Ensure alloc and buffer access are in the same function + ensureSameFunction(buffer, access.getArrayBase()) and + // Ensure size defintion and use are in same function, even for non-constant expressions. + ensureSameFunction(bufferSizeExpr, buffer.getSizeExpr()) and + // + arrayTypeSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() and + 1 = allocBaseSize +// +select bufferSizeExpr, buffer, access, accessIdx, upperBound(accessIdx) as accessMax, bufferSize, + access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType() as arrayBaseType, + buffer.getSizeMult() as bufferBaseTypeSize, + arrayBaseType.getSize() as arrayBaseTypeSize, + allocBaseSize * bufferSize as allocatedUnits, arrayTypeSize * accessMax as maxAccessedIndex + +/** Ensure the two expressions are in the same function body. */ +predicate ensureSameFunction(Expr a, Expr b) { DataFlow::localExprFlow(a, b) } + +/** + * Gets an expression that flows to the allocation (which includes those already in the allocation) + * and has a constant value. + */ +predicate getAllocConstantExpr(Expr bufferSizeExpr, int bufferSize) { + exists(AllocationExpr buffer | + // Capture BOTH with datflow: + // 1. + // malloc (100) + // ^^^ bufferSize + // 2. + // unsigned long size = 100; ... ; char *buf = malloc(size); + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + bufferSizeExpr.getValue().toInt() = bufferSize + ) +} diff --git a/session/session.md b/session/session.md index 0611e8b..e66a0df 100644 --- a/session/session.md +++ b/session/session.md @@ -6,37 +6,48 @@ - [Session/Workshop notes](#sessionworkshop-notes) - [Step 1](#exercise-1) - [Hints](#hints) - - [Solution](#orgdb027c4) - - [Step 2](#org103a4a0) + - [Solution](#org2fc84f1) + - [Step 2](#org81cc6bb) - [Hints](#hints) - - [Solution](#org9a38362) - - [Results](#org0373f43) + - [Solution](#orgd82dd3e) + - [Results](#orgb9c5185) - [Step 3](#exercise-2) - - [Solution](#org100e79f) - - [Results](#orgc91557b) - - [Step 4](#org5ed496c) - - [Hint](#org353b905) - - [Solution](#org5e7a90d) - - [Results](#orgc376727) - - [Step 4a – some clean-up using predicates](#org5fbc890) - - [Solution](#orgc0ef9cd) - - [Step 5 – SimpleRangeAnalysis](#org3250327) - - [Solution](#orga42f2d0) - - [Results](#org2dd5caf) - - [Step 6](#org7bfbf7f) - - [Solution](#orgbf7f580) - - [Results](#orgc770d19) - - [Step 7](#org27d428b) - - [Solution:](#org1d7080e) - - [Results](#orgcc04f97) - - [Step 8](#orgd04e3b9) - - [Solution:](#org6638628) - - [Results](#org42712eb) - - [Interim notes](#org4007937) - - [Step 9 – Global Value Numbering](#orgde8bf97) - - [interim](#org08c13b9) - - [interim](#org11a3f79) - - [hashconsing](#orgc7ce1fc) + - [Solution](#orgbfcc6e8) + - [Results](#org77d93e8) + - [Step 4](#orgcac2df5) + - [Hint](#orge73d8c3) + - [Solution](#orgcce7aa5) + - [Results](#org58fd83d) + - [Step 4a – some clean-up using predicates](#org1a3052f) + - [Solution](#orgf922609) + - [Step 5 – SimpleRangeAnalysis](#org0df2f23) + - [Solution](#orgb23c26e) + - [First 5 results](#org921d64a) + - [Step 6](#org2b0d3ac) + - [Solution](#orgdd0881f) + - [First 5 results](#org3e7d47c) + - [Step 7](#org00edfe5) + - [Solution:](#org8a3a4b1) + - [First 5 results](#org2f15e3e) + - [Step 7a](#orgfa97dcd) + - [Solution:](#org3894df3) + - [First 5 results](#orgcbdf216) + - [Step 7b](#org58aba89) + - [incoporate](#orgd60c31b) + - [incoporate](#org3319200) + - [Step 8](#orgcfcb55c) + - [Solution:](#orgede6c66) + - [Results](#orgfa54b95) + - [Interim notes](#org68d8dfb) + - [Step 9 – Global Value Numbering](#orge1acc6c) + - [incorporate](#orgb19cfc3) + - [incorporate](#orgda109c3) + - [incoporate](#org8d0c13d) + - [incoporate](#org45411bb) + - [incoporate](#org364861b) + - [interim](#orgc0ae12b) + - [interim](#orgb2f39ee) + - [hashconsing](#org6332f3e) @@ -154,8 +165,9 @@ To find these issues, 1. We can implement an analysis that tracks the upper or lower bounds on an expression. 2. We then combine this with data-flow analysis to reduce false positives and identify cases where the index of the array results in an access beyond the allocated size of the buffer. 3. We further extend these queries with rudimentary arithmetic support involving expressions common to the allocation and the array access. -4. For cases where this is insufficient, we introduce global value numbering [GVN](https://codeql.github.com/docs/codeql-language-guides/hash-consing-and-value-numbering) in [Step 9 – Global Value Numbering](#orgde8bf97), to detect values known to be equal at runtime. -5. When *those* cases are insufficient, and we handle the case of identical structure using [hashconsing](#orgc7ce1fc). +4. For cases where constant expressions are not available or are uncertain, we first try [range analysis](#org0df2f23) to expand the query's applicability. +5. For cases where this is insufficient, we introduce global value numbering [GVN](https://codeql.github.com/docs/codeql-language-guides/hash-consing-and-value-numbering) in [Step 9 – Global Value Numbering](#orge1acc6c), to detect values known to be equal at runtime. +6. When *those* cases are insufficient, we handle the case of identical structure using [hashconsing](#org6332f3e). @@ -187,7 +199,7 @@ in [db.c](file:///Users/hohn/local/codeql-workshop-runtime-values-c/session-db/D 1. `Expr::getValue()::toInt()` can be used to get the integer value of a constant expression. - + ### Solution @@ -219,7 +231,7 @@ select buffer, access, accessIdx, access.getArrayOffset(), bufferSize, allocSize This produces 12 results, with some cross-function pairs. - + ## Step 2 @@ -239,7 +251,7 @@ To address these, take the query from the previous exercise and 2. The the array base is the `buf` part of `buf[0]`. Use the `Expr.getArrayBase()` predicate. - + ### Solution @@ -281,7 +293,7 @@ select buffer, access, accessIdx, access.getArrayOffset(), bufferSize, allocSize ``` - + ### Results @@ -309,7 +321,7 @@ Here, the `malloc` argument is a variable with known value. We include this result by removing the size-retrieval from the prior query. - + ### Solution @@ -350,14 +362,14 @@ select buffer, access, accessIdx, access.getArrayOffset() ``` - + ### Results Now, we get 12 results, including some from other test cases. - + ## Step 4 @@ -370,12 +382,12 @@ Note the results for the cases in `test_const_var` which involve a variable acce We have an expression `size` that flows into the `malloc()` call. - + ### Hint - + ### Solution @@ -423,22 +435,14 @@ select buffer, access, accessIdx, access.getArrayOffset(), bufferSize, bse ``` - + ### Results Now, we get 15 results, limited to statically determined values. -XX: Implement predicates `getSourceConstantExpr`, `getFixedSize`, and `getFixedArrayOffset` Use local data-flow analysis to complete the `getSourceConstantExpr` predicate. The `getFixedSize` and `getFixedArrayOffset` predicates can be completed using `getSourceConstantExpr`. -XX: - -1. start with query. `elementSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize()` -2. convert to predicate. -3. then use classes, if desired. `class BufferAccess extends ArrayExpr` is different from those below. - - - + ## Step 4a – some clean-up using predicates @@ -460,8 +464,13 @@ DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and ``` +Also, simplify the `from...where...select`: + +1. Remove unnecessary `exists` clauses. +2. Use `DataFlow::localExprFlow` for the buffer and allocation sizes, with `getValue().toInt()` as one possibility (one predicate). - + + ### Solution @@ -515,7 +524,7 @@ predicate getAllocConstantExpr(Expr bufferSizeExpr, int bufferSize) { ``` - + ## Step 5 – SimpleRangeAnalysis @@ -527,19 +536,22 @@ The CodeQL standard library has several mechanisms for addressing this problem; Although not in the scope of this workshop, a standard use-case for range analysis is reliably identifying integer overflow and validating integer overflow checks. -First, simplify the `from...where...select`: - -1. Remove unnecessary `exists` clauses. -2. Use `DataFlow::localExprFlow` for the buffer and allocation sizes, not `getValue().toInt()` - -Then, add the use of the `SimpleRangeAnalysis` library. Specifically, the relevant library predicates are `upperBound` and `lowerBound`, to be used with the buffer access argument. Experiment and decide which to use for this exercise (`upperBound`, `lowerBound`, or both). +Now, add the use of the `SimpleRangeAnalysis` library. Specifically, the relevant library predicates are `upperBound` and `lowerBound`, to be used with the buffer access argument. -This requires the import +Notes: - import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis +- This requires the import + + import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis +- We are not limiting the array access to integers any longer. Thus, we just use + + accessIdx = access.getArrayOffset() +- To see the results in the order used in the C code, use + + select bufferSizeExpr, buffer, access, accessIdx, upperBound(accessIdx) as accessMax - + ### Solution @@ -548,10 +560,8 @@ import cpp import semmle.code.cpp.dataflow.DataFlow import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis -// Step 5 -from - AllocationExpr buffer, ArrayExpr access, Expr accessIdx, Expr allocSizeExpr, int bufferSize, - int allocsize, Expr bufferSizeExpr + +from AllocationExpr buffer, ArrayExpr access, Expr accessIdx, int bufferSize, Expr bufferSizeExpr where // malloc (100) // ^^^^^^^^^^^^ AllocationExpr buffer @@ -567,36 +577,52 @@ where // malloc (100) // ^^^ allocSizeExpr / bufferSize // - // Not really: - // allocSizeExpr = buffer.(Call).getArgument(0) and - // - DataFlow::localExprFlow(allocSizeExpr, buffer.(Call).getArgument(0)) and - allocsize = allocSizeExpr.getValue().toInt() and - // - // unsigned long size = 100; - // ... - // char *buf = malloc(size); - DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and - bufferSizeExpr.getValue().toInt() = bufferSize and - // char *buf = ... buf[0]; - // ^^^ ---> ^^^ - // or - // malloc(100); buf[0] - // ^^^ --------> ^^^ - // - DataFlow::localExprFlow(buffer, access.getArrayBase()) -select buffer, bufferSizeExpr, access, upperBound(accessIdx) as accessMax, accessIdx, allocsize, allocSizeExpr + getAllocConstantExpr(bufferSizeExpr, bufferSize) and + // Ensure alloc and buffer access are in the same function + ensureSameFunction(buffer, access.getArrayBase()) and + // Ensure size defintion and use are in same function, even for non-constant expressions. + ensureSameFunction(bufferSizeExpr, buffer.getSizeExpr()) +// +select bufferSizeExpr, buffer, access, accessIdx, upperBound(accessIdx) as accessMax + +/** Ensure the two expressions are in the same function body. */ +predicate ensureSameFunction(Expr a, Expr b) { DataFlow::localExprFlow(a, b) } + +/** + * Gets an expression that flows to the allocation (which includes those already in the allocation) + * and has a constant value. + */ +predicate getAllocConstantExpr(Expr bufferSizeExpr, int bufferSize) { + exists(AllocationExpr buffer | + // + // Capture BOTH with datflow: + // 1. + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // + // 2. + // unsigned long size = 100; + // ... + // char *buf = malloc(size); + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + bufferSizeExpr.getValue().toInt() = bufferSize + ) +} ``` - + -### Results +### First 5 results -Now, we get 48 results. +| test.c:7:24:7:26 | 100 | test.c:7:17:7:22 | call to malloc | test.c:8:5:8:10 | access to array | test.c:8:9:8:9 | 0 | 0.0 | +| test.c:7:24:7:26 | 100 | test.c:7:17:7:22 | call to malloc | test.c:9:5:9:11 | access to array | test.c:9:9:9:10 | 99 | 99.0 | +| test.c:7:24:7:26 | 100 | test.c:7:17:7:22 | call to malloc | test.c:10:5:10:12 | access to array | test.c:10:9:10:11 | 100 | 100.0 | +| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:17:5:17:10 | access to array | test.c:17:9:17:9 | 0 | 0.0 | +| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:18:5:18:11 | access to array | test.c:18:9:18:10 | 99 | 99.0 | - + ## Step 6 @@ -613,12 +639,10 @@ Hints: 2. Note from the docs: *The malloc() function allocates size bytes of memory and returns a pointer to the allocated memory.* So `size = 1` -3. Note that `allocSizeExpr.getUnspecifiedType() as allocBaseType` is wrong here. - -4. These test cases all use type `char`. What would happen for `int` or `double`? +3. These test cases all use type `char`. What would happen for `int` or `double`? - + ### Solution @@ -627,66 +651,83 @@ import cpp import semmle.code.cpp.dataflow.DataFlow import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis -// Step 6 -from - AllocationExpr buffer, ArrayExpr access, Expr accessIdx, Expr allocSizeExpr, int bufferSize, - int allocsize, Expr bufferSizeExpr +from AllocationExpr buffer, ArrayExpr access, Expr accessIdx, int bufferSize, Expr bufferSizeExpr where // malloc (100) // ^^^^^^^^^^^^ AllocationExpr buffer // // buf[...] // ^^^ ArrayExpr access + // // buf[...] // ^^^ int accessIdx + // accessIdx = access.getArrayOffset() and // // malloc (100) // ^^^ allocSizeExpr / bufferSize // - // Not really: - // allocSizeExpr = buffer.(Call).getArgument(0) and - // - DataFlow::localExprFlow(allocSizeExpr, buffer.(Call).getArgument(0)) and - allocsize = allocSizeExpr.getValue().toInt() and - // - // unsigned long size = 100; - // ... - // char *buf = malloc(size); - DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and - bufferSizeExpr.getValue().toInt() = bufferSize and - // char *buf = ... buf[0]; - // ^^^ ---> ^^^ - // or - // malloc(100); buf[0] - // ^^^ --------> ^^^ - // - DataFlow::localExprFlow(buffer, access.getArrayBase()) -select buffer, bufferSizeExpr, access, upperBound(accessIdx) as accessMax, accessIdx, allocsize, allocSizeExpr, access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType() as arrayBaseType, access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() as arrayTypeSize, 1 as allocBaseSize -``` + getAllocConstantExpr(bufferSizeExpr, bufferSize) and + // Ensure alloc and buffer access are in the same function + ensureSameFunction(buffer, access.getArrayBase()) and + // Ensure size defintion and use are in same function, even for non-constant expressions. + ensureSameFunction(bufferSizeExpr, buffer.getSizeExpr()) +// +select bufferSizeExpr, buffer, access, accessIdx, upperBound(accessIdx) as accessMax, bufferSize, + access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType() as arrayBaseType, + access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() as arrayTypeSize, + 1 as allocBaseSize +/** Ensure the two expressions are in the same function body. */ +predicate ensureSameFunction(Expr a, Expr b) { DataFlow::localExprFlow(a, b) } - +/** + * Gets an expression that flows to the allocation (which includes those already in the allocation) + * and has a constant value. + */ +predicate getAllocConstantExpr(Expr bufferSizeExpr, int bufferSize) { + exists(AllocationExpr buffer | + // + // Capture BOTH with datflow: + // 1. + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // + // 2. + // unsigned long size = 100; + // ... + // char *buf = malloc(size); + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + bufferSizeExpr.getValue().toInt() = bufferSize + ) +} +``` -### Results -48 results in the table + + +### First 5 results -| | | | | | | | | | | | -|--- |-------------- |--- |--------------- |--- |--- |--- |--- |---- |--- |--- | -| 1 | call to malloc | 200 | access to array | 0 | 0 | 200 | 200 | char | 1 | 1 | +| test.c:7:24:7:26 | 100 | test.c:7:17:7:22 | call to malloc | test.c:8:5:8:10 | access to array | test.c:8:9:8:9 | 0 | 0.0 | 100 | | char | 1 | 1 | +| test.c:7:24:7:26 | 100 | test.c:7:17:7:22 | call to malloc | test.c:9:5:9:11 | access to array | test.c:9:9:9:10 | 99 | 99.0 | 100 | | char | 1 | 1 | +| test.c:7:24:7:26 | 100 | test.c:7:17:7:22 | call to malloc | test.c:10:5:10:12 | access to array | test.c:10:9:10:11 | 100 | 100.0 | 100 | | char | 1 | 1 | +| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:17:5:17:10 | access to array | test.c:17:9:17:9 | 0 | 0.0 | 100 | | char | 1 | 1 | +| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:18:5:18:11 | access to array | test.c:18:9:18:10 | 99 | 99.0 | 100 | | char | 1 | 1 | - + ## Step 7 1. Clean up the query. -2. Add expressions for `allocatedUnits` (from the malloc) and a `maxAccessedIndex` (from array accesses) -3. Compare buffer allocation size to the access index. +2. Compare buffer allocation size to the access index. +3. Add expressions for `allocatedUnits` (from the malloc) and a `maxAccessedIndex` (from array accesses) + 1. Calculate the `accessOffset` / `maxAccessedIndex` (from array accesses) + 2. Calculate the `allocSize` / `allocatedUnits` (from the malloc) + 3. Compare them - + ### Solution: @@ -695,14 +736,12 @@ import cpp import semmle.code.cpp.dataflow.DataFlow import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis -// Step 7 from - AllocationExpr buffer, ArrayExpr access, Expr accessIdx, Expr allocSizeExpr, int bufferSize, - int allocsize, Expr bufferSizeExpr, int arrayTypeSize, int allocBaseSize + AllocationExpr buffer, ArrayExpr access, Expr accessIdx, int bufferSize, Expr bufferSizeExpr, + int arrayTypeSize, int allocBaseSize where // malloc (100) // ^^^^^^^^^^^^ AllocationExpr buffer - // // buf[...] // ^^^ ArrayExpr access // buf[...] @@ -712,43 +751,180 @@ where // malloc (100) // ^^^ allocSizeExpr / bufferSize // - // Not really: - // allocSizeExpr = buffer.(Call).getArgument(0) and - // - DataFlow::localExprFlow(allocSizeExpr, buffer.(Call).getArgument(0)) and - allocsize = allocSizeExpr.getValue().toInt() and + getAllocConstantExpr(bufferSizeExpr, bufferSize) and + // Ensure alloc and buffer access are in the same function + ensureSameFunction(buffer, access.getArrayBase()) and + // Ensure size defintion and use are in same function, even for non-constant expressions. + ensureSameFunction(bufferSizeExpr, buffer.getSizeExpr()) and // - // unsigned long size = 100; - // ... - // char *buf = malloc(size); - DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and - bufferSizeExpr.getValue().toInt() = bufferSize and - // char *buf = ... buf[0]; - // ^^^ ---> ^^^ - // or - // malloc(100); buf[0] - // ^^^ --------> ^^^ + arrayTypeSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() and + 1 = allocBaseSize +// +select bufferSizeExpr, buffer, access, accessIdx, upperBound(accessIdx) as accessMax, bufferSize, + access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType() as arrayBaseType, + allocBaseSize * bufferSize as allocatedUnits, arrayTypeSize * accessMax as maxAccessedIndex + +/** Ensure the two expressions are in the same function body. */ +predicate ensureSameFunction(Expr a, Expr b) { DataFlow::localExprFlow(a, b) } + +/** + * Gets an expression that flows to the allocation (which includes those already in the allocation) + * and has a constant value. + */ +predicate getAllocConstantExpr(Expr bufferSizeExpr, int bufferSize) { + exists(AllocationExpr buffer | + // Capture BOTH with datflow: + // 1. + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // 2. + // unsigned long size = 100; ... ; char *buf = malloc(size); + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + bufferSizeExpr.getValue().toInt() = bufferSize + ) +} +``` + + + + +### First 5 results + +| test.c:7:24:7:26 | 100 | test.c:7:17:7:22 | call to malloc | test.c:8:5:8:10 | access to array | test.c:8:9:8:9 | 0 | 0.0 | 100 | | char | 100 | 0.0 | +| test.c:7:24:7:26 | 100 | test.c:7:17:7:22 | call to malloc | test.c:9:5:9:11 | access to array | test.c:9:9:9:10 | 99 | 99.0 | 100 | | char | 100 | 99.0 | +| test.c:7:24:7:26 | 100 | test.c:7:17:7:22 | call to malloc | test.c:10:5:10:12 | access to array | test.c:10:9:10:11 | 100 | 100.0 | 100 | | char | 100 | 100.0 | +| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:17:5:17:10 | access to array | test.c:17:9:17:9 | 0 | 0.0 | 100 | | char | 100 | 0.0 | +| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:18:5:18:11 | access to array | test.c:18:9:18:10 | 99 | 99.0 | 100 | | char | 100 | 99.0 | + + + + +## Step 7a + +1. Account for base sizes – `char` in this case. +2. Put all expressions into the select for review. + + + + +### Solution: + +```java +import cpp +import semmle.code.cpp.dataflow.DataFlow +import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis + +from + AllocationExpr buffer, ArrayExpr access, Expr accessIdx, int bufferSize, Expr bufferSizeExpr, + int arrayTypeSize, int allocBaseSize +where + // malloc (100) + // ^^^^^^^^^^^^ AllocationExpr buffer + // buf[...] + // ^^^^^^^^ ArrayExpr access + // ^^^ int accessIdx + accessIdx = access.getArrayOffset() and + getAllocConstantExpr(bufferSizeExpr, bufferSize) and + // Ensure alloc and buffer access are in the same function + ensureSameFunction(buffer, access.getArrayBase()) and + // Ensure size defintion and use are in same function, even for non-constant expressions. + ensureSameFunction(bufferSizeExpr, buffer.getSizeExpr()) and // - arrayTypeSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() - and + arrayTypeSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() and 1 = allocBaseSize - and - DataFlow::localExprFlow(buffer, access.getArrayBase()) -select buffer, bufferSizeExpr, access, upperBound(accessIdx) as accessMax, allocSizeExpr, allocBaseSize * allocsize as allocatedUnits, arrayTypeSize * accessMax as maxAccessedIndex +// +select bufferSizeExpr, buffer, access, accessIdx, upperBound(accessIdx) as accessMax, bufferSize, + access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType() as arrayBaseType, + buffer.getSizeMult() as bufferBaseTypeSize, + arrayBaseType.getSize() as arrayBaseTypeSize, + allocBaseSize * bufferSize as allocatedUnits, arrayTypeSize * accessMax as maxAccessedIndex + +/** Ensure the two expressions are in the same function body. */ +predicate ensureSameFunction(Expr a, Expr b) { DataFlow::localExprFlow(a, b) } + +/** + * Gets an expression that flows to the allocation (which includes those already in the allocation) + * and has a constant value. + */ +predicate getAllocConstantExpr(Expr bufferSizeExpr, int bufferSize) { + exists(AllocationExpr buffer | + // Capture BOTH with datflow: + // 1. + // malloc (100) + // ^^^ bufferSize + // 2. + // unsigned long size = 100; ... ; char *buf = malloc(size); + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + bufferSizeExpr.getValue().toInt() = bufferSize + ) +} ``` - + + +### First 5 results + +| test.c:7:24:7:26 | 100 | test.c:7:17:7:22 | call to malloc | test.c:8:5:8:10 | access to array | test.c:8:9:8:9 | 0 | 0.0 | 100 | | char | 1 | 1 | 100 | 0.0 | +| test.c:7:24:7:26 | 100 | test.c:7:17:7:22 | call to malloc | test.c:9:5:9:11 | access to array | test.c:9:9:9:10 | 99 | 99.0 | 100 | | char | 1 | 1 | 100 | 99.0 | +| test.c:7:24:7:26 | 100 | test.c:7:17:7:22 | call to malloc | test.c:10:5:10:12 | access to array | test.c:10:9:10:11 | 100 | 100.0 | 100 | | char | 1 | 1 | 100 | 100.0 | +| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:17:5:17:10 | access to array | test.c:17:9:17:9 | 0 | 0.0 | 100 | | char | 1 | 1 | 100 | 0.0 | +| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:18:5:18:11 | access to array | test.c:18:9:18:10 | 99 | 99.0 | 100 | | char | 1 | 1 | 100 | 99.0 | + + + + +## Step 7b + +Introduce more general predicates. + +1. Move these into a single predicate, `isOffsetOutOfBoundsConstant` -### Results -48 results in the much cleaner table + -| no. | buffer | bufferSizeExpr | access | accessMax | allocSizeExpr | allocatedUnits | maxAccessedIndex | | -| 1 | call to malloc | 200 | access to array | 0 | 200 | 200 | 0 | | +### TODO incoporate +```java +/** + * Gets the smallest of the upper bound of `e` or the largest source value + * (i.e. "stated value") that flows to `e`. Because range-analysis can over-widen + * bounds, take the minimum of range analysis and data-flow sources. + * + * If there is no source value that flows to `e`, this predicate does not hold. + * + * This predicate, if `e` is the `sz` arg to `malloc`, would return `20` for the + * following: + * + * size_t sz = condition ? 10 : 20; + * malloc(sz); + * + */ +bindingset[e] +int getMaxStatedValue(Expr e) { + result = upperBound(e).minimum(max(getSourceConstantExpr(e).getValue().toInt())) +} +``` - + + + +### TODO incoporate + +```java +predicate isOffsetOutOfBoundsConstant( + ArrayExpr access, FunctionCall source, int allocSize, int accessOffset +) { + ensureSameFunction(access, source) and + // allocatedBufferArrayAccess(access, source) and + allocSize = getMaxStatedValue(source.getArgument(0)) and + accessOffset = getFixedArrayOffset(access) and + accessOffset >= allocSize +} +``` + + + ## Step 8 @@ -766,7 +942,7 @@ select buffer, bufferSizeExpr, access, upperBound(accessIdx) as accessMax, alloc to get nicer reporting. - + ### Solution: @@ -826,7 +1002,7 @@ select access, "Array access at or beyond size; have "+allocatedUnits + " units, ``` - + ### Results @@ -837,7 +1013,7 @@ select access, "Array access at or beyond size; have "+allocatedUnits + " units, | Array access at or beyond size; have 200 units, access at 200 | db.c:67:5 | - + ## Interim notes @@ -850,7 +1026,7 @@ int val = rand() ? rand() : 30; A similar case is present in the `test_const_branch` and `test_const_branch2` test-cases. In these cases, it is necessary to augment range analysis with data-flow and restrict the bounds to the upper or lower bound of computable constants that flow to a given expression. Another approach is global value numbering, used next. - + ## Step 9 – Global Value Numbering @@ -868,9 +1044,9 @@ Reference: , <=` etc.), and the *actual* value is not known. -XX: global value numbering finds expressions with the same known value, independent of structure. +Global value numbering finds expressions with the same known value, independent of structure. -So, we look for and use *relative* values between allocation and use. To do this, use GVN. +So, we look for and use *relative* values between allocation and use. The relevant CodeQL constructs are @@ -889,7 +1065,107 @@ We can use global value numbering to identify common values as first step, but f we have to "evaluate" the expressions – or at least bound them. - + + +### DONE incorporate + +Done by `ensureSameFunction` instead. + +```java +predicate allocatedBufferArrayAccess(ArrayExpr access, FunctionCall alloc) { + alloc.getTarget().hasName("malloc") and + DataFlow::localExprFlow(alloc, access.getArrayBase()) +} +``` + + + + +### TODO incorporate + +```java +bindingset[expr] +int getExprOffsetValue(Expr expr, Expr base) { + result = expr.(AddExpr).getRightOperand().getValue().toInt() and + base = expr.(AddExpr).getLeftOperand() + or + result = -expr.(SubExpr).getRightOperand().getValue().toInt() and + base = expr.(SubExpr).getLeftOperand() + or + // currently only AddExpr and SubExpr are supported: else, fall-back to 0 + not expr instanceof AddExpr and + not expr instanceof SubExpr and + base = expr and + result = 0 +} +``` + + + + +### TODO incoporate + +```java +int getFixedArrayOffset(ArrayExpr access) { + exists(Expr base, int offset | + offset = getExprOffsetValue(access.getArrayOffset(), base) and + result = getMaxStatedValue(base) + offset + ) +} +``` + + + + +### TODO incoporate + +```java +predicate isOffsetOutOfBoundsGVN(ArrayExpr access, FunctionCall source) { + ensureSameFunction(access, source) and + not isOffsetOutOfBoundsConstant(access, source, _, _) and + exists(Expr accessOffsetBase, int accessOffsetBaseValue | + accessOffsetBaseValue = getExprOffsetValue(access.getArrayOffset(), accessOffsetBase) and + globalValueNumber(source.getArgument(0)) = globalValueNumber(accessOffsetBase) and + not accessOffsetBaseValue < 0 + ) +} +``` + + + + +### TODO incoporate + +```java +/** + * @id cpp/array-access-out-of-bounds + * @description Access of an array with an index that is greater or equal to the element num. + * @kind problem + * @problem.severity error + */ + +import cpp +import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis +import semmle.code.cpp.dataflow.DataFlow +import semmle.code.cpp.valuenumbering.GlobalValueNumbering +import RuntimeValues + +from FunctionCall source, ArrayExpr access, string message +where + exists(int allocSize, int accessOffset | + isOffsetOutOfBoundsConstant(access, source, allocSize, accessOffset) and + message = + "Array access out of bounds: " + access.toString() + " with offset " + accessOffset.toString() + + " on $@ with size " + allocSize.toString() + ) + or + isOffsetOutOfBoundsGVN(access, source) and + message = "Array access with index that is greater or equal to the size of the $@." +select access, message, source, "allocation" +``` + + + ### interim @@ -952,7 +1228,7 @@ select access, ``` - + ### interim @@ -1011,7 +1287,7 @@ select access, gvnAccess, gvnAlloc ``` - + ## TODO hashconsing diff --git a/session/session.org b/session/session.org index 935735c..5856fd3 100644 --- a/session/session.org +++ b/session/session.org @@ -379,22 +379,35 @@ To address these, take the query from the previous exercise and #+INCLUDE: "../session-tests/Example7/example7.expected" :lines "-6"’ ** Step 7a + 1. Account for base sizes -- =char= in this case. + 2. Put all expressions into the select for review. + +*** Solution: + #+INCLUDE: "example7a.ql" src java + +*** First 5 results + #+INCLUDE: "../session-tests/Example7a/example7a.expected" :lines "-6"’ + +** Step 7b Introduce more general predicates. + 1. Move these into a single predicate, =isOffsetOutOfBoundsConstant= *** TODO incoporate #+BEGIN_SRC java /** - ,* Gets the smallest of the upper bound of `e` or the largest source value (i.e. "stated value") that flows to `e`. - ,* Because range-analysis can over-widen bounds, take the minimum of range analysis and data-flow sources. + ,* Gets the smallest of the upper bound of `e` or the largest source value + ,* (i.e. "stated value") that flows to `e`. Because range-analysis can over-widen + ,* bounds, take the minimum of range analysis and data-flow sources. ,* ,* If there is no source value that flows to `e`, this predicate does not hold. ,* - ,* This predicate, if `e` is the `sz` arg to `malloc`, would return `20` for the following: - ,* ``` + ,* This predicate, if `e` is the `sz` arg to `malloc`, would return `20` for the + ,* following: + ,* ,* size_t sz = condition ? 10 : 20; ,* malloc(sz); - ,* ``` + ,* ,*/ bindingset[e] int getMaxStatedValue(Expr e) { From da7781a99ffae2e7b785abd5c8e801f931bf1a33 Mon Sep 17 00:00:00 2001 From: Michael Hohn Date: Wed, 17 May 2023 20:11:17 -0700 Subject: [PATCH 18/28] Added interim step 7a, detailing some predicate derivation --- session-tests/Example7b/example7b.expected | 31 +++++++ session-tests/Example7b/example7b.qlref | 1 + session-tests/Example7b/test.c | 85 +++++++++++++++++++ session/example7b.ql | 98 ++++++++++++++++++++++ 4 files changed, 215 insertions(+) create mode 100644 session-tests/Example7b/example7b.expected create mode 100644 session-tests/Example7b/example7b.qlref create mode 100644 session-tests/Example7b/test.c create mode 100644 session/example7b.ql diff --git a/session-tests/Example7b/example7b.expected b/session-tests/Example7b/example7b.expected new file mode 100644 index 0000000..609165e --- /dev/null +++ b/session-tests/Example7b/example7b.expected @@ -0,0 +1,31 @@ +WARNING: Unused predicate computeIndices (/Users/hohn/local/codeql-workshop-runtime-values-c/session/example7b.ql:59,11-25) +| test.c:7:24:7:26 | 100 | test.c:7:17:7:22 | call to malloc | test.c:8:5:8:10 | access to array | 100 | 0 | +| test.c:7:24:7:26 | 100 | test.c:7:17:7:22 | call to malloc | test.c:9:5:9:11 | access to array | 100 | 99 | +| test.c:7:24:7:26 | 100 | test.c:7:17:7:22 | call to malloc | test.c:10:5:10:12 | access to array | 100 | 100 | +| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:17:5:17:10 | access to array | 100 | 0 | +| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:18:5:18:11 | access to array | 100 | 99 | +| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:19:5:19:17 | access to array | 100 | 99 | +| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:20:5:20:12 | access to array | 100 | 100 | +| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:21:5:21:13 | access to array | 100 | 100 | +| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:35:5:35:10 | access to array | 100 | 0 | +| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:36:5:36:11 | access to array | 100 | 99 | +| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:37:5:37:17 | access to array | 100 | 299 | +| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:38:5:38:12 | access to array | 100 | 100 | +| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:39:5:39:13 | access to array | 100 | 300 | +| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:43:9:43:17 | access to array | 100 | 198 | +| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:44:9:44:21 | access to array | 100 | 199 | +| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:45:9:45:21 | access to array | 100 | 200 | +| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:35:5:35:10 | access to array | 200 | 0 | +| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:36:5:36:11 | access to array | 200 | 99 | +| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:37:5:37:17 | access to array | 200 | 299 | +| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:38:5:38:12 | access to array | 200 | 100 | +| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:39:5:39:13 | access to array | 200 | 300 | +| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:43:9:43:17 | access to array | 200 | 198 | +| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:44:9:44:21 | access to array | 200 | 199 | +| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:45:9:45:21 | access to array | 200 | 200 | +| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:65:5:65:10 | access to array | 200 | 0 | +| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:66:5:66:12 | access to array | 200 | 100 | +| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:67:5:67:12 | access to array | 200 | 200 | +| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:73:9:73:23 | access to array | 200 | 198 | +| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:74:9:74:27 | access to array | 200 | 199 | +| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:75:9:75:27 | access to array | 200 | 200 | diff --git a/session-tests/Example7b/example7b.qlref b/session-tests/Example7b/example7b.qlref new file mode 100644 index 0000000..6f8007b --- /dev/null +++ b/session-tests/Example7b/example7b.qlref @@ -0,0 +1 @@ +example7b.ql diff --git a/session-tests/Example7b/test.c b/session-tests/Example7b/test.c new file mode 100644 index 0000000..6eaa405 --- /dev/null +++ b/session-tests/Example7b/test.c @@ -0,0 +1,85 @@ +void *malloc(unsigned long);/* clang compatible */ + +unsigned long extern_get_size(void); + +void test_const(void) +{ + char *buf = malloc(100); + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[100]; // NON_COMPLIANT +} + +void test_const_var(void) +{ + unsigned long size = 100; + char *buf = malloc(size); + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[size - 1]; // COMPLIANT + buf[100]; // NON_COMPLIANT + buf[size]; // NON_COMPLIANT +} + +void test_const_branch(int mode, int random_condition) +{ + unsigned long size = (mode == 1 ? 100 : 200); + + char *buf = malloc(size); + + if (random_condition) + { + size = 300; + } + + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[size - 1]; // NON_COMPLIANT + buf[100]; // NON_COMPLIANT[DONT REPORT] + buf[size]; // NON_COMPLIANT + + if (size < 199) + { + buf[size]; // COMPLIANT + buf[size + 1]; // COMPLIANT + buf[size + 2]; // NON_COMPLIANT + } +} + +void test_const_branch2(int mode) +{ + unsigned long alloc_size = 0; + + if (mode == 1) + { + alloc_size = 200; + } + else + { + // unknown const size - don't report accesses + alloc_size = extern_get_size(); + } + + char *buf = malloc(alloc_size); + + buf[0]; // COMPLIANT + buf[100]; // COMPLIANT + buf[200]; // NON_COMPLIANT + buf[alloc_size - 1]; // COMPLIANT + buf[alloc_size]; // NON_COMPLIANT + + if (alloc_size < 199) + { + buf[alloc_size]; // COMPLIANT + buf[alloc_size + 1]; // COMPLIANT + buf[alloc_size + 2]; // NON_COMPLIANT + } +} + +void test_gvn_var(unsigned long x, unsigned long y, unsigned long sz) +{ + char *buf = malloc(sz * x * y); + buf[sz * x * y - 1]; // COMPLIANT + buf[sz * x * y]; // NON_COMPLIANT + buf[sz * x * y + 1]; // NON_COMPLIANT +} diff --git a/session/example7b.ql b/session/example7b.ql new file mode 100644 index 0000000..88c6b2d --- /dev/null +++ b/session/example7b.ql @@ -0,0 +1,98 @@ +import cpp +import semmle.code.cpp.dataflow.DataFlow +import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis + +from + AllocationExpr buffer, ArrayExpr access, int bufferSize, Expr bufferSizeExpr, + int maxAccessedIndex, int allocatedUnits +where + // malloc (100) + // ^^^^^^^^^^^^ AllocationExpr buffer + getAllocConstantExpr(bufferSizeExpr, bufferSize) and + // Ensure alloc and buffer access are in the same function + ensureSameFunction(buffer, access.getArrayBase()) and + // Ensure size defintion and use are in same function, even for non-constant expressions. + ensureSameFunction(bufferSizeExpr, buffer.getSizeExpr()) and + // computeIndices(access, buffer, bufferSize, allocatedUnits, maxAccessedIndex) + computeAllocationSize(buffer, bufferSize, allocatedUnits) and + computeMaxAccess(access, maxAccessedIndex) +select bufferSizeExpr, buffer, access, allocatedUnits, maxAccessedIndex + +/** + * Compute the maximum accessed index. + */ +predicate computeMaxAccess(ArrayExpr access, int maxAccessedIndex) { + exists( + int arrayTypeSize, int accessMax, Type arrayBaseType, int arrayBaseTypeSize, Expr accessIdx + | + // buf[...] + // ^^^^^^^^ ArrayExpr access + // ^^^ + accessIdx = access.getArrayOffset() and + upperBound(accessIdx) = accessMax and + arrayBaseType.getSize() = arrayBaseTypeSize and + access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType() = arrayBaseType and + arrayTypeSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() and + arrayTypeSize * accessMax = maxAccessedIndex + ) +} + +/** + * Compute the allocation size. + */ +bindingset[bufferSize] +predicate computeAllocationSize(AllocationExpr buffer, int bufferSize, int allocatedUnits) { + exists(int bufferBaseTypeSize, Type arrayBaseType, int arrayBaseTypeSize | + // buf[...] + // ^^^^^^^^ ArrayExpr access + // ^^^ + buffer.getSizeMult() = bufferBaseTypeSize and + arrayBaseType.getSize() = arrayBaseTypeSize and + bufferSize * bufferBaseTypeSize = allocatedUnits + ) +} + +/** + * Compute the allocation size and the maximum accessed index for the allocation and access. + */ +bindingset[bufferSize] +predicate computeIndices( + ArrayExpr access, AllocationExpr buffer, int bufferSize, int allocatedUnits, int maxAccessedIndex +) { + exists( + int arrayTypeSize, int accessMax, int bufferBaseTypeSize, Type arrayBaseType, + int arrayBaseTypeSize, Expr accessIdx + | + // buf[...] + // ^^^^^^^^ ArrayExpr access + // ^^^ + accessIdx = access.getArrayOffset() and + upperBound(accessIdx) = accessMax and + buffer.getSizeMult() = bufferBaseTypeSize and + arrayBaseType.getSize() = arrayBaseTypeSize and + bufferSize * bufferBaseTypeSize = allocatedUnits and + access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType() = arrayBaseType and + arrayTypeSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() and + arrayTypeSize * accessMax = maxAccessedIndex + ) +} + +/** Ensure the two expressions are in the same function body. */ +predicate ensureSameFunction(Expr a, Expr b) { DataFlow::localExprFlow(a, b) } + +/** + * Gets an expression that flows to the allocation (which includes those already in the allocation) + * and has a constant value. + */ +predicate getAllocConstantExpr(Expr bufferSizeExpr, int bufferSize) { + exists(AllocationExpr buffer | + // Capture BOTH with datflow: + // 1. + // malloc (100) + // ^^^ bufferSize + // 2. + // unsigned long size = 100; ... ; char *buf = malloc(size); + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + bufferSizeExpr.getValue().toInt() = bufferSize + ) +} From bedd04938a81b05b9067a3f7cb261769b1a19f80 Mon Sep 17 00:00:00 2001 From: Michael Hohn Date: Thu, 18 May 2023 18:57:18 -0700 Subject: [PATCH 19/28] Include size comparison in example7b, from 8 --- session-tests/Example7b/example7b.expected | 46 +++------ session/example7b.ql | 16 ++-- session/session.org | 106 +++++++++++++-------- 3 files changed, 91 insertions(+), 77 deletions(-) diff --git a/session-tests/Example7b/example7b.expected b/session-tests/Example7b/example7b.expected index 609165e..585f0dc 100644 --- a/session-tests/Example7b/example7b.expected +++ b/session-tests/Example7b/example7b.expected @@ -1,31 +1,15 @@ -WARNING: Unused predicate computeIndices (/Users/hohn/local/codeql-workshop-runtime-values-c/session/example7b.ql:59,11-25) -| test.c:7:24:7:26 | 100 | test.c:7:17:7:22 | call to malloc | test.c:8:5:8:10 | access to array | 100 | 0 | -| test.c:7:24:7:26 | 100 | test.c:7:17:7:22 | call to malloc | test.c:9:5:9:11 | access to array | 100 | 99 | -| test.c:7:24:7:26 | 100 | test.c:7:17:7:22 | call to malloc | test.c:10:5:10:12 | access to array | 100 | 100 | -| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:17:5:17:10 | access to array | 100 | 0 | -| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:18:5:18:11 | access to array | 100 | 99 | -| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:19:5:19:17 | access to array | 100 | 99 | -| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:20:5:20:12 | access to array | 100 | 100 | -| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:21:5:21:13 | access to array | 100 | 100 | -| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:35:5:35:10 | access to array | 100 | 0 | -| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:36:5:36:11 | access to array | 100 | 99 | -| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:37:5:37:17 | access to array | 100 | 299 | -| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:38:5:38:12 | access to array | 100 | 100 | -| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:39:5:39:13 | access to array | 100 | 300 | -| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:43:9:43:17 | access to array | 100 | 198 | -| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:44:9:44:21 | access to array | 100 | 199 | -| test.c:26:39:26:41 | 100 | test.c:28:17:28:22 | call to malloc | test.c:45:9:45:21 | access to array | 100 | 200 | -| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:35:5:35:10 | access to array | 200 | 0 | -| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:36:5:36:11 | access to array | 200 | 99 | -| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:37:5:37:17 | access to array | 200 | 299 | -| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:38:5:38:12 | access to array | 200 | 100 | -| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:39:5:39:13 | access to array | 200 | 300 | -| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:43:9:43:17 | access to array | 200 | 198 | -| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:44:9:44:21 | access to array | 200 | 199 | -| test.c:26:45:26:47 | 200 | test.c:28:17:28:22 | call to malloc | test.c:45:9:45:21 | access to array | 200 | 200 | -| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:65:5:65:10 | access to array | 200 | 0 | -| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:66:5:66:12 | access to array | 200 | 100 | -| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:67:5:67:12 | access to array | 200 | 200 | -| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:73:9:73:23 | access to array | 200 | 198 | -| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:74:9:74:27 | access to array | 200 | 199 | -| test.c:55:22:55:24 | 200 | test.c:63:17:63:22 | call to malloc | test.c:75:9:75:27 | access to array | 200 | 200 | +WARNING: Unused predicate computeIndices (/Users/hohn/local/codeql-workshop-runtime-values-c/session/example7b.ql:65,11-25) +| test.c:10:5:10:12 | access to array | Array access at or beyond size; have 100 units, access at 100 | +| test.c:20:5:20:12 | access to array | Array access at or beyond size; have 100 units, access at 100 | +| test.c:21:5:21:13 | access to array | Array access at or beyond size; have 100 units, access at 100 | +| test.c:37:5:37:17 | access to array | Array access at or beyond size; have 100 units, access at 299 | +| test.c:37:5:37:17 | access to array | Array access at or beyond size; have 200 units, access at 299 | +| test.c:38:5:38:12 | access to array | Array access at or beyond size; have 100 units, access at 100 | +| test.c:39:5:39:13 | access to array | Array access at or beyond size; have 100 units, access at 300 | +| test.c:39:5:39:13 | access to array | Array access at or beyond size; have 200 units, access at 300 | +| test.c:43:9:43:17 | access to array | Array access at or beyond size; have 100 units, access at 198 | +| test.c:44:9:44:21 | access to array | Array access at or beyond size; have 100 units, access at 199 | +| test.c:45:9:45:21 | access to array | Array access at or beyond size; have 100 units, access at 200 | +| test.c:45:9:45:21 | access to array | Array access at or beyond size; have 200 units, access at 200 | +| test.c:67:5:67:12 | access to array | Array access at or beyond size; have 200 units, access at 200 | +| test.c:75:9:75:27 | access to array | Array access at or beyond size; have 200 units, access at 200 | diff --git a/session/example7b.ql b/session/example7b.ql index 88c6b2d..385474a 100644 --- a/session/example7b.ql +++ b/session/example7b.ql @@ -16,7 +16,13 @@ where // computeIndices(access, buffer, bufferSize, allocatedUnits, maxAccessedIndex) computeAllocationSize(buffer, bufferSize, allocatedUnits) and computeMaxAccess(access, maxAccessedIndex) -select bufferSizeExpr, buffer, access, allocatedUnits, maxAccessedIndex + // only consider out-of-bounds + and + maxAccessedIndex >= allocatedUnits +select access, + "Array access at or beyond size; have " + allocatedUnits + " units, access at " + maxAccessedIndex + +// select bufferSizeExpr, buffer, access, allocatedUnits, maxAccessedIndex /** * Compute the maximum accessed index. @@ -87,11 +93,9 @@ predicate ensureSameFunction(Expr a, Expr b) { DataFlow::localExprFlow(a, b) } predicate getAllocConstantExpr(Expr bufferSizeExpr, int bufferSize) { exists(AllocationExpr buffer | // Capture BOTH with datflow: - // 1. - // malloc (100) - // ^^^ bufferSize - // 2. - // unsigned long size = 100; ... ; char *buf = malloc(size); + // 1. malloc (100) + // ^^^ bufferSize + // 2. unsigned long size = 100; ... ; char *buf = malloc(size); DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and bufferSizeExpr.getValue().toInt() = bufferSize ) diff --git a/session/session.org b/session/session.org index 5856fd3..8aa8dac 100644 --- a/session/session.org +++ b/session/session.org @@ -389,9 +389,72 @@ To address these, take the query from the previous exercise and #+INCLUDE: "../session-tests/Example7a/example7a.expected" :lines "-6"’ ** Step 7b - Introduce more general predicates. + 1. Introduce more general predicates. + 2. Compare buffer allocation size to the access index. + 3. Report only the questionable entries. + +*** Solution: + #+INCLUDE: "example7b.ql" src java + +*** First 5 results + #+INCLUDE: "../session-tests/Example7b/example7b.expected" :lines "-6"’ + +** Step 8 + 1. Clean up the query. + 2. Find the constant offset in differences of the form + : buf[sz * x * y - 1]; + and sums of the form + : buf[alloc_size + 1]; + via a predicate. Keep both terms for later use. - 1. Move these into a single predicate, =isOffsetOutOfBoundsConstant= + and + extractBaseAndOffset(bufferSizeExpr, bufferBase, bufferOffset) + and + extractBaseAndOffset(bufferSizeExpr, bufferBase, bufferOffset) + +*** TODO incorporate + #+BEGIN_SRC java + /** + ,* Extract base and offset from y = base+offset and y = base-offset. For others, get y and 0. + ,* + ,* For cases like + ,* buf[sz * x * y - 1]; + ,* and + ,* buf[alloc_size + 1]; + ,*/ + bindingset[expr] + predicate extractBaseAndOffset(Expr expr, Expr base, int offset) { + offset = expr.(AddExpr).getRightOperand().getValue().toInt() and + base = expr.(AddExpr).getLeftOperand() + or + offset = -expr.(SubExpr).getRightOperand().getValue().toInt() and + base = expr.(SubExpr).getLeftOperand() + or + // Currently only AddExpr and SubExpr are supported: else, fall-back to 0 + not expr instanceof AddExpr and + not expr instanceof SubExpr and + base = expr and + offset = 0 + } + #+END_SRC + + instead of + #+BEGIN_SRC java + bindingset[expr] + int getExprOffsetValue(Expr expr, Expr base) { + result = expr.(AddExpr).getRightOperand().getValue().toInt() and + base = expr.(AddExpr).getLeftOperand() + or + result = -expr.(SubExpr).getRightOperand().getValue().toInt() and + base = expr.(SubExpr).getLeftOperand() + or + // currently only AddExpr and SubExpr are supported: else, fall-back to 0 + not expr instanceof AddExpr and + not expr instanceof SubExpr and + base = expr and + result = 0 + } + #+END_SRC *** TODO incoporate #+BEGIN_SRC java @@ -412,6 +475,7 @@ To address these, take the query from the previous exercise and bindingset[e] int getMaxStatedValue(Expr e) { result = upperBound(e).minimum(max(getSourceConstantExpr(e).getValue().toInt())) +// getAllocConstantExpr, not getSourceConstantExpr } #+END_SRC @@ -427,26 +491,6 @@ To address these, take the query from the previous exercise and accessOffset >= allocSize } #+END_SRC - -** Step 8 - 1. Clean up the query. - 2. Compare buffer allocation size to the access index. - 3. Report only the questionable entries. - 4. Use - #+BEGIN_SRC java - /** - ,* @kind problem - ,*/ - #+END_SRC - to get nicer reporting. - -*** Solution: - #+INCLUDE: "example8.ql" src java - -*** Results - 14 results in the much cleaner table - - | Array access at or beyond size; have 200 units, access at 200 | db.c:67:5 | ** Interim notes A common issue with the =SimpleRangeAnalysis= library is handling of @@ -516,24 +560,6 @@ To address these, take the query from the previous exercise and } #+END_SRC -*** TODO incorporate - #+BEGIN_SRC java - bindingset[expr] - int getExprOffsetValue(Expr expr, Expr base) { - result = expr.(AddExpr).getRightOperand().getValue().toInt() and - base = expr.(AddExpr).getLeftOperand() - or - result = -expr.(SubExpr).getRightOperand().getValue().toInt() and - base = expr.(SubExpr).getLeftOperand() - or - // currently only AddExpr and SubExpr are supported: else, fall-back to 0 - not expr instanceof AddExpr and - not expr instanceof SubExpr and - base = expr and - result = 0 - } - #+END_SRC - *** TODO incoporate #+BEGIN_SRC java int getFixedArrayOffset(ArrayExpr access) { From 841a33b438540058c4a9419017a5ef01d9a307f2 Mon Sep 17 00:00:00 2001 From: Michael Hohn Date: Fri, 19 May 2023 20:50:12 -0700 Subject: [PATCH 20/28] Add session snapshot script --- snapshot-from | 40 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+) create mode 100755 snapshot-from diff --git a/snapshot-from b/snapshot-from new file mode 100755 index 0000000..ddc1706 --- /dev/null +++ b/snapshot-from @@ -0,0 +1,40 @@ +#!/bin/bash -e +usage="$0 from.ql to + +Create +session-tests/queryfile/ +├── queryfile.expected +├── queryfile..qlref +└── test.c + +Example: +$0 session/example8.ql example8a +" + +if [ $# -ne 2 ]; then + echo "$usage" + exit 1 +fi + +query=$1 +from=$(basename $(echo $1 | sed s/\.ql//g;)) +to=session-tests/$2 +tof=$2 + +if [ ! -f $query ] ; then + echo "Missing source query file $query (1st argument)" + exit 1 +fi + +echo "Creating test directory $to" +mkdir -p $to +echo "no value" > $to/$tof.expected +echo $tof.ql > $to/$tof.qlref +cp session-db/DB/db.c $to/test.c + +echo "Creating source file $to" +cp $query session/$tof.ql + + + + From c04eebe67819eab9634ae13c0590f31442790183 Mon Sep 17 00:00:00 2001 From: Michael Hohn Date: Fri, 19 May 2023 20:50:58 -0700 Subject: [PATCH 21/28] Update step 8 --- session-tests/Example8/example8.expected | 26 ++--- session-tests/example8a/example8a.expected | 12 ++ session-tests/example8a/example8a.qlref | 1 + session-tests/example8a/test.c | 85 ++++++++++++++ session/example8.ql | 93 ++++++++------- session/example8a.ql | 59 ++++++++++ session/session.org | 127 ++++++++------------- 7 files changed, 269 insertions(+), 134 deletions(-) create mode 100644 session-tests/example8a/example8a.expected create mode 100644 session-tests/example8a/example8a.qlref create mode 100644 session-tests/example8a/test.c create mode 100644 session/example8a.ql diff --git a/session-tests/Example8/example8.expected b/session-tests/Example8/example8.expected index abca8e7..407663f 100644 --- a/session-tests/Example8/example8.expected +++ b/session-tests/Example8/example8.expected @@ -1,14 +1,12 @@ -| test.c:10:5:10:12 | access to array | Array access at or beyond size; have 100 units, access at 100 | -| test.c:20:5:20:12 | access to array | Array access at or beyond size; have 100 units, access at 100 | -| test.c:21:5:21:13 | access to array | Array access at or beyond size; have 100 units, access at 100 | -| test.c:37:5:37:17 | access to array | Array access at or beyond size; have 100 units, access at 299 | -| test.c:37:5:37:17 | access to array | Array access at or beyond size; have 200 units, access at 299 | -| test.c:38:5:38:12 | access to array | Array access at or beyond size; have 100 units, access at 100 | -| test.c:39:5:39:13 | access to array | Array access at or beyond size; have 100 units, access at 300 | -| test.c:39:5:39:13 | access to array | Array access at or beyond size; have 200 units, access at 300 | -| test.c:43:9:43:17 | access to array | Array access at or beyond size; have 100 units, access at 198 | -| test.c:44:9:44:21 | access to array | Array access at or beyond size; have 100 units, access at 199 | -| test.c:45:9:45:21 | access to array | Array access at or beyond size; have 100 units, access at 200 | -| test.c:45:9:45:21 | access to array | Array access at or beyond size; have 200 units, access at 200 | -| test.c:67:5:67:12 | access to array | Array access at or beyond size; have 200 units, access at 200 | -| test.c:75:9:75:27 | access to array | Array access at or beyond size; have 200 units, access at 200 | +| test.c:16:17:16:22 | call to malloc | test.c:16:24:16:27 | size | 0 | test.c:19:5:19:17 | access to array | test.c:19:9:19:12 | size | -1 | test.c:15:19:15:22 | size | test.c:15:19:15:22 | size | +| test.c:16:17:16:22 | call to malloc | test.c:16:24:16:27 | size | 0 | test.c:21:5:21:13 | access to array | test.c:21:9:21:12 | size | 0 | test.c:15:19:15:22 | size | test.c:15:19:15:22 | size | +| test.c:28:17:28:22 | call to malloc | test.c:28:24:28:27 | size | 0 | test.c:37:5:37:17 | access to array | test.c:37:9:37:12 | size | -1 | test.c:26:19:26:22 | size | test.c:26:19:26:22 | size | +| test.c:28:17:28:22 | call to malloc | test.c:28:24:28:27 | size | 0 | test.c:39:5:39:13 | access to array | test.c:39:9:39:12 | size | 0 | test.c:26:19:26:22 | size | test.c:26:19:26:22 | size | +| test.c:28:17:28:22 | call to malloc | test.c:28:24:28:27 | size | 0 | test.c:43:9:43:17 | access to array | test.c:43:13:43:16 | size | 0 | test.c:26:19:26:22 | size | test.c:26:19:26:22 | size | +| test.c:28:17:28:22 | call to malloc | test.c:28:24:28:27 | size | 0 | test.c:44:9:44:21 | access to array | test.c:44:13:44:16 | size | 1 | test.c:26:19:26:22 | size | test.c:26:19:26:22 | size | +| test.c:28:17:28:22 | call to malloc | test.c:28:24:28:27 | size | 0 | test.c:45:9:45:21 | access to array | test.c:45:13:45:16 | size | 2 | test.c:26:19:26:22 | size | test.c:26:19:26:22 | size | +| test.c:63:17:63:22 | call to malloc | test.c:63:24:63:33 | alloc_size | 0 | test.c:68:5:68:23 | access to array | test.c:68:9:68:18 | alloc_size | -1 | test.c:51:19:51:28 | alloc_size | test.c:51:19:51:28 | alloc_size | +| test.c:63:17:63:22 | call to malloc | test.c:63:24:63:33 | alloc_size | 0 | test.c:69:5:69:19 | access to array | test.c:69:9:69:18 | alloc_size | 0 | test.c:51:19:51:28 | alloc_size | test.c:51:19:51:28 | alloc_size | +| test.c:63:17:63:22 | call to malloc | test.c:63:24:63:33 | alloc_size | 0 | test.c:73:9:73:23 | access to array | test.c:73:13:73:22 | alloc_size | 0 | test.c:51:19:51:28 | alloc_size | test.c:51:19:51:28 | alloc_size | +| test.c:63:17:63:22 | call to malloc | test.c:63:24:63:33 | alloc_size | 0 | test.c:74:9:74:27 | access to array | test.c:74:13:74:22 | alloc_size | 1 | test.c:51:19:51:28 | alloc_size | test.c:51:19:51:28 | alloc_size | +| test.c:63:17:63:22 | call to malloc | test.c:63:24:63:33 | alloc_size | 0 | test.c:75:9:75:27 | access to array | test.c:75:13:75:22 | alloc_size | 2 | test.c:51:19:51:28 | alloc_size | test.c:51:19:51:28 | alloc_size | diff --git a/session-tests/example8a/example8a.expected b/session-tests/example8a/example8a.expected new file mode 100644 index 0000000..407663f --- /dev/null +++ b/session-tests/example8a/example8a.expected @@ -0,0 +1,12 @@ +| test.c:16:17:16:22 | call to malloc | test.c:16:24:16:27 | size | 0 | test.c:19:5:19:17 | access to array | test.c:19:9:19:12 | size | -1 | test.c:15:19:15:22 | size | test.c:15:19:15:22 | size | +| test.c:16:17:16:22 | call to malloc | test.c:16:24:16:27 | size | 0 | test.c:21:5:21:13 | access to array | test.c:21:9:21:12 | size | 0 | test.c:15:19:15:22 | size | test.c:15:19:15:22 | size | +| test.c:28:17:28:22 | call to malloc | test.c:28:24:28:27 | size | 0 | test.c:37:5:37:17 | access to array | test.c:37:9:37:12 | size | -1 | test.c:26:19:26:22 | size | test.c:26:19:26:22 | size | +| test.c:28:17:28:22 | call to malloc | test.c:28:24:28:27 | size | 0 | test.c:39:5:39:13 | access to array | test.c:39:9:39:12 | size | 0 | test.c:26:19:26:22 | size | test.c:26:19:26:22 | size | +| test.c:28:17:28:22 | call to malloc | test.c:28:24:28:27 | size | 0 | test.c:43:9:43:17 | access to array | test.c:43:13:43:16 | size | 0 | test.c:26:19:26:22 | size | test.c:26:19:26:22 | size | +| test.c:28:17:28:22 | call to malloc | test.c:28:24:28:27 | size | 0 | test.c:44:9:44:21 | access to array | test.c:44:13:44:16 | size | 1 | test.c:26:19:26:22 | size | test.c:26:19:26:22 | size | +| test.c:28:17:28:22 | call to malloc | test.c:28:24:28:27 | size | 0 | test.c:45:9:45:21 | access to array | test.c:45:13:45:16 | size | 2 | test.c:26:19:26:22 | size | test.c:26:19:26:22 | size | +| test.c:63:17:63:22 | call to malloc | test.c:63:24:63:33 | alloc_size | 0 | test.c:68:5:68:23 | access to array | test.c:68:9:68:18 | alloc_size | -1 | test.c:51:19:51:28 | alloc_size | test.c:51:19:51:28 | alloc_size | +| test.c:63:17:63:22 | call to malloc | test.c:63:24:63:33 | alloc_size | 0 | test.c:69:5:69:19 | access to array | test.c:69:9:69:18 | alloc_size | 0 | test.c:51:19:51:28 | alloc_size | test.c:51:19:51:28 | alloc_size | +| test.c:63:17:63:22 | call to malloc | test.c:63:24:63:33 | alloc_size | 0 | test.c:73:9:73:23 | access to array | test.c:73:13:73:22 | alloc_size | 0 | test.c:51:19:51:28 | alloc_size | test.c:51:19:51:28 | alloc_size | +| test.c:63:17:63:22 | call to malloc | test.c:63:24:63:33 | alloc_size | 0 | test.c:74:9:74:27 | access to array | test.c:74:13:74:22 | alloc_size | 1 | test.c:51:19:51:28 | alloc_size | test.c:51:19:51:28 | alloc_size | +| test.c:63:17:63:22 | call to malloc | test.c:63:24:63:33 | alloc_size | 0 | test.c:75:9:75:27 | access to array | test.c:75:13:75:22 | alloc_size | 2 | test.c:51:19:51:28 | alloc_size | test.c:51:19:51:28 | alloc_size | diff --git a/session-tests/example8a/example8a.qlref b/session-tests/example8a/example8a.qlref new file mode 100644 index 0000000..770eadf --- /dev/null +++ b/session-tests/example8a/example8a.qlref @@ -0,0 +1 @@ +example8a.ql diff --git a/session-tests/example8a/test.c b/session-tests/example8a/test.c new file mode 100644 index 0000000..6eaa405 --- /dev/null +++ b/session-tests/example8a/test.c @@ -0,0 +1,85 @@ +void *malloc(unsigned long);/* clang compatible */ + +unsigned long extern_get_size(void); + +void test_const(void) +{ + char *buf = malloc(100); + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[100]; // NON_COMPLIANT +} + +void test_const_var(void) +{ + unsigned long size = 100; + char *buf = malloc(size); + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[size - 1]; // COMPLIANT + buf[100]; // NON_COMPLIANT + buf[size]; // NON_COMPLIANT +} + +void test_const_branch(int mode, int random_condition) +{ + unsigned long size = (mode == 1 ? 100 : 200); + + char *buf = malloc(size); + + if (random_condition) + { + size = 300; + } + + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[size - 1]; // NON_COMPLIANT + buf[100]; // NON_COMPLIANT[DONT REPORT] + buf[size]; // NON_COMPLIANT + + if (size < 199) + { + buf[size]; // COMPLIANT + buf[size + 1]; // COMPLIANT + buf[size + 2]; // NON_COMPLIANT + } +} + +void test_const_branch2(int mode) +{ + unsigned long alloc_size = 0; + + if (mode == 1) + { + alloc_size = 200; + } + else + { + // unknown const size - don't report accesses + alloc_size = extern_get_size(); + } + + char *buf = malloc(alloc_size); + + buf[0]; // COMPLIANT + buf[100]; // COMPLIANT + buf[200]; // NON_COMPLIANT + buf[alloc_size - 1]; // COMPLIANT + buf[alloc_size]; // NON_COMPLIANT + + if (alloc_size < 199) + { + buf[alloc_size]; // COMPLIANT + buf[alloc_size + 1]; // COMPLIANT + buf[alloc_size + 2]; // NON_COMPLIANT + } +} + +void test_gvn_var(unsigned long x, unsigned long y, unsigned long sz) +{ + char *buf = malloc(sz * x * y); + buf[sz * x * y - 1]; // COMPLIANT + buf[sz * x * y]; // NON_COMPLIANT + buf[sz * x * y + 1]; // NON_COMPLIANT +} diff --git a/session/example8.ql b/session/example8.ql index ec2b598..90a690f 100644 --- a/session/example8.ql +++ b/session/example8.ql @@ -1,52 +1,59 @@ -/** - * @kind problem - */ - import cpp import semmle.code.cpp.dataflow.DataFlow import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis -// Step 8 from - AllocationExpr buffer, ArrayExpr access, Expr accessIdx, Expr allocSizeExpr, int bufferSize, - int allocsize, Expr bufferSizeExpr, int arrayTypeSize, int allocBaseSize, int accessMax, - int allocatedUnits, int maxAccessedIndex + AllocationExpr buffer, ArrayExpr access, Expr bufferSizeExpr, + // --- + // int maxAccessedIndex, int allocatedUnits, + // int bufferSize + int accessOffset, Expr accessBase, Expr bufferBase, int bufferOffset, Variable bufInit, + Variable accessInit where - // malloc (100) + // malloc (...) // ^^^^^^^^^^^^ AllocationExpr buffer - // - // buf[...] - // ^^^ ArrayExpr access - // buf[...] - // ^^^ int accessIdx - accessIdx = access.getArrayOffset() and - // - // malloc (100) - // ^^^ allocSizeExpr / bufferSize - // - // Not really: - // allocSizeExpr = buffer.(Call).getArgument(0) and - // - DataFlow::localExprFlow(allocSizeExpr, buffer.(Call).getArgument(0)) and - allocsize = allocSizeExpr.getValue().toInt() and - // - // unsigned long size = 100; - // ... - // char *buf = malloc(size); + // --- + // getAllocConstExpr(...) + // +++ + bufferSizeExpr = buffer.getSizeExpr() and + // Ensure buffer access refers to the matching allocation + DataFlow::localExprFlow(buffer, access.getArrayBase()) and + // Ensure buffer access refers to the matching allocation DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and - bufferSizeExpr.getValue().toInt() = bufferSize and - // char *buf = ... buf[0]; - // ^^^ ---> ^^^ - // or - // malloc(100); buf[0] - // ^^^ --------> ^^^ // - arrayTypeSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() and - 1 = allocBaseSize and - DataFlow::localExprFlow(buffer, access.getArrayBase()) and - upperBound(accessIdx) = accessMax and - allocBaseSize * allocsize = allocatedUnits and - arrayTypeSize * accessMax = maxAccessedIndex and - // only consider out-of-bounds - maxAccessedIndex >= allocatedUnits -select access, "Array access at or beyond size; have "+allocatedUnits + " units, access at "+ maxAccessedIndex + // +++ + // base+offset + extractBaseAndOffset(bufferSizeExpr, bufferBase, bufferOffset) and + extractBaseAndOffset(access.getArrayOffset(), accessBase, accessOffset) and + // +++ + // Same initializer variable + bufferBase.(VariableAccess).getTarget() = bufInit and + accessBase.(VariableAccess).getTarget() = accessInit and + bufInit = accessInit +// +++ +// Identify questionable differences +select buffer, bufferBase, bufferOffset, access, accessBase, accessOffset, bufInit, accessInit + +/** + * Extract base and offset from y = base+offset and y = base-offset. For others, get y and 0. + * + * For cases like + * buf[alloc_size + 1]; + * + * The more general + * buf[sz * x * y - 1]; + * requires other tools. + */ +bindingset[expr] +predicate extractBaseAndOffset(Expr expr, Expr base, int offset) { + offset = expr.(AddExpr).getRightOperand().getValue().toInt() and + base = expr.(AddExpr).getLeftOperand() + or + offset = -expr.(SubExpr).getRightOperand().getValue().toInt() and + base = expr.(SubExpr).getLeftOperand() + or + not expr instanceof AddExpr and + not expr instanceof SubExpr and + base = expr and + offset = 0 +} diff --git a/session/example8a.ql b/session/example8a.ql new file mode 100644 index 0000000..90a690f --- /dev/null +++ b/session/example8a.ql @@ -0,0 +1,59 @@ +import cpp +import semmle.code.cpp.dataflow.DataFlow +import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis + +from + AllocationExpr buffer, ArrayExpr access, Expr bufferSizeExpr, + // --- + // int maxAccessedIndex, int allocatedUnits, + // int bufferSize + int accessOffset, Expr accessBase, Expr bufferBase, int bufferOffset, Variable bufInit, + Variable accessInit +where + // malloc (...) + // ^^^^^^^^^^^^ AllocationExpr buffer + // --- + // getAllocConstExpr(...) + // +++ + bufferSizeExpr = buffer.getSizeExpr() and + // Ensure buffer access refers to the matching allocation + DataFlow::localExprFlow(buffer, access.getArrayBase()) and + // Ensure buffer access refers to the matching allocation + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + // + // +++ + // base+offset + extractBaseAndOffset(bufferSizeExpr, bufferBase, bufferOffset) and + extractBaseAndOffset(access.getArrayOffset(), accessBase, accessOffset) and + // +++ + // Same initializer variable + bufferBase.(VariableAccess).getTarget() = bufInit and + accessBase.(VariableAccess).getTarget() = accessInit and + bufInit = accessInit +// +++ +// Identify questionable differences +select buffer, bufferBase, bufferOffset, access, accessBase, accessOffset, bufInit, accessInit + +/** + * Extract base and offset from y = base+offset and y = base-offset. For others, get y and 0. + * + * For cases like + * buf[alloc_size + 1]; + * + * The more general + * buf[sz * x * y - 1]; + * requires other tools. + */ +bindingset[expr] +predicate extractBaseAndOffset(Expr expr, Expr base, int offset) { + offset = expr.(AddExpr).getRightOperand().getValue().toInt() and + base = expr.(AddExpr).getLeftOperand() + or + offset = -expr.(SubExpr).getRightOperand().getValue().toInt() and + base = expr.(SubExpr).getLeftOperand() + or + not expr instanceof AddExpr and + not expr instanceof SubExpr and + base = expr and + offset = 0 +} diff --git a/session/session.org b/session/session.org index 8aa8dac..19540d0 100644 --- a/session/session.org +++ b/session/session.org @@ -400,84 +400,51 @@ To address these, take the query from the previous exercise and #+INCLUDE: "../session-tests/Example7b/example7b.expected" :lines "-6"’ ** Step 8 - 1. Clean up the query. - 2. Find the constant offset in differences of the form - : buf[sz * x * y - 1]; - and sums of the form - : buf[alloc_size + 1]; - via a predicate. Keep both terms for later use. - - and - extractBaseAndOffset(bufferSizeExpr, bufferBase, bufferOffset) - and - extractBaseAndOffset(bufferSizeExpr, bufferBase, bufferOffset) - -*** TODO incorporate - #+BEGIN_SRC java - /** - ,* Extract base and offset from y = base+offset and y = base-offset. For others, get y and 0. - ,* - ,* For cases like - ,* buf[sz * x * y - 1]; - ,* and - ,* buf[alloc_size + 1]; - ,*/ - bindingset[expr] - predicate extractBaseAndOffset(Expr expr, Expr base, int offset) { - offset = expr.(AddExpr).getRightOperand().getValue().toInt() and - base = expr.(AddExpr).getLeftOperand() - or - offset = -expr.(SubExpr).getRightOperand().getValue().toInt() and - base = expr.(SubExpr).getLeftOperand() - or - // Currently only AddExpr and SubExpr are supported: else, fall-back to 0 - not expr instanceof AddExpr and - not expr instanceof SubExpr and - base = expr and - offset = 0 - } - #+END_SRC - - instead of - #+BEGIN_SRC java - bindingset[expr] - int getExprOffsetValue(Expr expr, Expr base) { - result = expr.(AddExpr).getRightOperand().getValue().toInt() and - base = expr.(AddExpr).getLeftOperand() - or - result = -expr.(SubExpr).getRightOperand().getValue().toInt() and - base = expr.(SubExpr).getLeftOperand() - or - // currently only AddExpr and SubExpr are supported: else, fall-back to 0 - not expr instanceof AddExpr and - not expr instanceof SubExpr and - base = expr and - result = 0 - } - #+END_SRC + Up to now, we have dealt with constant values + #+BEGIN_SRC c++ + char *buf = malloc(100); + buf[0]; // COMPLIANT + #+END_SRC + or + #+BEGIN_SRC c++ + unsigned long size = 100; + char *buf = malloc(size); + buf[0]; // COMPLIANT + #+END_SRC + and statically determinable or boundable values + #+BEGIN_SRC c++ + char *buf = malloc(size); + if (size < 199) + { + buf[size]; // COMPLIANT + // ... + } + #+END_SRC -*** TODO incoporate - #+BEGIN_SRC java - /** - ,* Gets the smallest of the upper bound of `e` or the largest source value - ,* (i.e. "stated value") that flows to `e`. Because range-analysis can over-widen - ,* bounds, take the minimum of range analysis and data-flow sources. - ,* - ,* If there is no source value that flows to `e`, this predicate does not hold. - ,* - ,* This predicate, if `e` is the `sz` arg to `malloc`, would return `20` for the - ,* following: - ,* - ,* size_t sz = condition ? 10 : 20; - ,* malloc(sz); - ,* - ,*/ - bindingset[e] - int getMaxStatedValue(Expr e) { - result = upperBound(e).minimum(max(getSourceConstantExpr(e).getValue().toInt())) -// getAllocConstantExpr, not getSourceConstantExpr - } - #+END_SRC + There is another statically determinable case. Examples are + 1. A simple expression + #+BEGIN_SRC c++ + char *buf = malloc(alloc_size); + // ... + buf[alloc_size - 1]; // COMPLIANT + buf[alloc_size]; // NON_COMPLIANT + #+END_SRC + 2. A complex expression + #+BEGIN_SRC c++ + char *buf = malloc(sz * x * y); + buf[sz * x * y - 1]; // COMPLIANT + #+END_SRC + These both have the form =malloc(e)=, =buf[e+c]=, where =e= is an =Expr= and + =c= is a constant, possibly 0. Our existing queries only report known or + boundable results, but here =e= is neither. + + Write a new query, re-using or modifying the existing one to handle the simple + expression (case 1). + + Note: + - We are looking at the allocation expression again, not its possible value. + - This only handles very specific cases. Constructing counterexamples is easy. + - We will address this in the next section. *** TODO incoporate #+BEGIN_SRC java @@ -492,6 +459,12 @@ To address these, take the query from the previous exercise and } #+END_SRC +*** Solution: + #+INCLUDE: "example8.ql" src java + +*** First 5 results + #+INCLUDE: "../session-tests/Example8/example8.expected" :lines "-6"’ + ** Interim notes A common issue with the =SimpleRangeAnalysis= library is handling of cases where the bounds are undeterminable at compile-time on one or more From 5a40e93d02ffd20316265cf5e0e17333760e9455 Mon Sep 17 00:00:00 2001 From: Michael Hohn Date: Mon, 22 May 2023 14:47:22 -0700 Subject: [PATCH 22/28] Clarified the alloc and buffer access check intention --- session-tests/Example7b/example7b.expected | 2 +- session/example2.ql | 2 +- session/example3.ql | 2 +- session/example4.ql | 2 +- session/example4a.ql | 15 +++++++-------- session/example5.ql | 14 ++++++-------- session/example6.ql | 14 ++++++-------- session/example7.ql | 12 +++++------- session/example7a.ql | 12 +++++------- session/example7b.ql | 12 +++++------- 10 files changed, 38 insertions(+), 49 deletions(-) diff --git a/session-tests/Example7b/example7b.expected b/session-tests/Example7b/example7b.expected index 585f0dc..fdf766e 100644 --- a/session-tests/Example7b/example7b.expected +++ b/session-tests/Example7b/example7b.expected @@ -1,4 +1,4 @@ -WARNING: Unused predicate computeIndices (/Users/hohn/local/codeql-workshop-runtime-values-c/session/example7b.ql:65,11-25) +WARNING: Unused predicate computeIndices (/Users/hohn/local/codeql-workshop-runtime-values-c/session/example7b.ql:66,11-25) | test.c:10:5:10:12 | access to array | Array access at or beyond size; have 100 units, access at 100 | | test.c:20:5:20:12 | access to array | Array access at or beyond size; have 100 units, access at 100 | | test.c:21:5:21:13 | access to array | Array access at or beyond size; have 100 units, access at 100 | diff --git a/session/example2.ql b/session/example2.ql index ea93823..d5f4306 100644 --- a/session/example2.ql +++ b/session/example2.ql @@ -23,7 +23,7 @@ where allocSizeExpr = buffer.(Call).getArgument(0) and bufferSize = allocSizeExpr.getValue().toInt() and // - // Ensure alloc and buffer access are in the same function + // Ensure buffer access is to the correct allocation. // char *buf = ... buf[0]; // ^^^ ---> ^^^ // or diff --git a/session/example3.ql b/session/example3.ql index a93c6b6..267d79b 100644 --- a/session/example3.ql +++ b/session/example3.ql @@ -22,7 +22,7 @@ where allocSizeExpr = buffer.(Call).getArgument(0) and // bufferSize = allocSizeExpr.getValue().toInt() and // - // Ensure alloc and buffer access are in the same function + // Ensure buffer access is to the correct allocation. // char *buf = ... buf[0]; // ^^^ ---> ^^^ // or diff --git a/session/example4.ql b/session/example4.ql index 0ae4a0e..b0bdf63 100644 --- a/session/example4.ql +++ b/session/example4.ql @@ -29,7 +29,7 @@ where bufferSizeExpr.getValue().toInt() = bufferSize and bse = bufferSizeExpr ) and - // Ensure alloc and buffer access are in the same function + // Ensure buffer access is to the correct allocation. // char *buf = ... buf[0]; // ^^^ ---> ^^^ // or diff --git a/session/example4a.ql b/session/example4a.ql index b90a317..3550038 100644 --- a/session/example4a.ql +++ b/session/example4a.ql @@ -14,16 +14,15 @@ where // accessIdx = access.getArrayOffset().getValue().toInt() and getAllocConstantExpr(bufferSizeExpr, bufferSize) and - // Ensure alloc and buffer access are in the same function - ensureSameFunction(buffer, access.getArrayBase()) and - // Ensure size defintion and use are in same function, even for non-constant expressions. - ensureSameFunction(bufferSizeExpr, buffer.getSizeExpr()) -// + // Ensure buffer access refers to the matching allocation + // ensureSameFunction(buffer, access.getArrayBase()) and + DataFlow::localExprFlow(buffer, access.getArrayBase()) and + // Ensure buffer access refers to the matching allocation + // ensureSameFunction(bufferSizeExpr, buffer.getSizeExpr()) and + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) + // select buffer, access, accessIdx, access.getArrayOffset(), bufferSize, bufferSizeExpr -/** Ensure the two expressions are in the same function body. */ -predicate ensureSameFunction(Expr a, Expr b) { DataFlow::localExprFlow(a, b) } - /** * Gets an expression that flows to the allocation (which includes those already in the allocation) * and has a constant value. diff --git a/session/example5.ql b/session/example5.ql index 3cc4e1b..321d4ac 100644 --- a/session/example5.ql +++ b/session/example5.ql @@ -20,16 +20,14 @@ where // ^^^ allocSizeExpr / bufferSize // getAllocConstantExpr(bufferSizeExpr, bufferSize) and - // Ensure alloc and buffer access are in the same function - ensureSameFunction(buffer, access.getArrayBase()) and - // Ensure size defintion and use are in same function, even for non-constant expressions. - ensureSameFunction(bufferSizeExpr, buffer.getSizeExpr()) -// + // Ensure buffer access is to the correct allocation. + DataFlow::localExprFlow(buffer, access.getArrayBase()) and + // Ensure use refers to the correct size defintion, even for non-constant + // expressions. + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) + // select bufferSizeExpr, buffer, access, accessIdx, upperBound(accessIdx) as accessMax -/** Ensure the two expressions are in the same function body. */ -predicate ensureSameFunction(Expr a, Expr b) { DataFlow::localExprFlow(a, b) } - /** * Gets an expression that flows to the allocation (which includes those already in the allocation) * and has a constant value. diff --git a/session/example6.ql b/session/example6.ql index 0a69c76..1ef2ca7 100644 --- a/session/example6.ql +++ b/session/example6.ql @@ -19,19 +19,17 @@ where // ^^^ allocSizeExpr / bufferSize // getAllocConstantExpr(bufferSizeExpr, bufferSize) and - // Ensure alloc and buffer access are in the same function - ensureSameFunction(buffer, access.getArrayBase()) and - // Ensure size defintion and use are in same function, even for non-constant expressions. - ensureSameFunction(bufferSizeExpr, buffer.getSizeExpr()) -// + // Ensure buffer access is to the correct allocation. + DataFlow::localExprFlow(buffer, access.getArrayBase()) and + // Ensure use refers to the correct size defintion, even for non-constant + // expressions. + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) + // select bufferSizeExpr, buffer, access, accessIdx, upperBound(accessIdx) as accessMax, bufferSize, access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType() as arrayBaseType, access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() as arrayTypeSize, 1 as allocBaseSize -/** Ensure the two expressions are in the same function body. */ -predicate ensureSameFunction(Expr a, Expr b) { DataFlow::localExprFlow(a, b) } - /** * Gets an expression that flows to the allocation (which includes those already in the allocation) * and has a constant value. diff --git a/session/example7.ql b/session/example7.ql index 0e2372e..d255bf2 100644 --- a/session/example7.ql +++ b/session/example7.ql @@ -18,10 +18,11 @@ where // ^^^ allocSizeExpr / bufferSize // getAllocConstantExpr(bufferSizeExpr, bufferSize) and - // Ensure alloc and buffer access are in the same function - ensureSameFunction(buffer, access.getArrayBase()) and - // Ensure size defintion and use are in same function, even for non-constant expressions. - ensureSameFunction(bufferSizeExpr, buffer.getSizeExpr()) and + // Ensure buffer access is to the correct allocation. + DataFlow::localExprFlow(buffer, access.getArrayBase()) and + // Ensure use refers to the correct size defintion, even for non-constant + // expressions. + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and // arrayTypeSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() and 1 = allocBaseSize @@ -30,9 +31,6 @@ select bufferSizeExpr, buffer, access, accessIdx, upperBound(accessIdx) as acces access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType() as arrayBaseType, allocBaseSize * bufferSize as allocatedUnits, arrayTypeSize * accessMax as maxAccessedIndex -/** Ensure the two expressions are in the same function body. */ -predicate ensureSameFunction(Expr a, Expr b) { DataFlow::localExprFlow(a, b) } - /** * Gets an expression that flows to the allocation (which includes those already in the allocation) * and has a constant value. diff --git a/session/example7a.ql b/session/example7a.ql index 8210a23..12fabb2 100644 --- a/session/example7a.ql +++ b/session/example7a.ql @@ -13,10 +13,11 @@ where // ^^^ int accessIdx accessIdx = access.getArrayOffset() and getAllocConstantExpr(bufferSizeExpr, bufferSize) and - // Ensure alloc and buffer access are in the same function - ensureSameFunction(buffer, access.getArrayBase()) and - // Ensure size defintion and use are in same function, even for non-constant expressions. - ensureSameFunction(bufferSizeExpr, buffer.getSizeExpr()) and + // Ensure buffer access is to the correct allocation. + DataFlow::localExprFlow(buffer, access.getArrayBase()) and + // Ensure use refers to the correct size defintion, even for non-constant + // expressions. + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and // arrayTypeSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() and 1 = allocBaseSize @@ -27,9 +28,6 @@ select bufferSizeExpr, buffer, access, accessIdx, upperBound(accessIdx) as acces arrayBaseType.getSize() as arrayBaseTypeSize, allocBaseSize * bufferSize as allocatedUnits, arrayTypeSize * accessMax as maxAccessedIndex -/** Ensure the two expressions are in the same function body. */ -predicate ensureSameFunction(Expr a, Expr b) { DataFlow::localExprFlow(a, b) } - /** * Gets an expression that flows to the allocation (which includes those already in the allocation) * and has a constant value. diff --git a/session/example7b.ql b/session/example7b.ql index 385474a..3daa12f 100644 --- a/session/example7b.ql +++ b/session/example7b.ql @@ -9,10 +9,11 @@ where // malloc (100) // ^^^^^^^^^^^^ AllocationExpr buffer getAllocConstantExpr(bufferSizeExpr, bufferSize) and - // Ensure alloc and buffer access are in the same function - ensureSameFunction(buffer, access.getArrayBase()) and - // Ensure size defintion and use are in same function, even for non-constant expressions. - ensureSameFunction(bufferSizeExpr, buffer.getSizeExpr()) and + // Ensure buffer access is to the correct allocation. + DataFlow::localExprFlow(buffer, access.getArrayBase()) and + // Ensure use refers to the correct size defintion, even for non-constant + // expressions. + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and // computeIndices(access, buffer, bufferSize, allocatedUnits, maxAccessedIndex) computeAllocationSize(buffer, bufferSize, allocatedUnits) and computeMaxAccess(access, maxAccessedIndex) @@ -83,9 +84,6 @@ predicate computeIndices( ) } -/** Ensure the two expressions are in the same function body. */ -predicate ensureSameFunction(Expr a, Expr b) { DataFlow::localExprFlow(a, b) } - /** * Gets an expression that flows to the allocation (which includes those already in the allocation) * and has a constant value. From 646c70fdc9068119460ffe2c39d43b1092c50b40 Mon Sep 17 00:00:00 2001 From: Michael Hohn Date: Mon, 22 May 2023 15:15:57 -0700 Subject: [PATCH 23/28] step 8a: Find /some/ problematic accesses by reverting to simple var+const checks --- session-tests/example8a/example8a.expected | 21 +++++------ session/example8a.ql | 12 ++++--- session/session.org | 42 ++++++++++------------ 3 files changed, 35 insertions(+), 40 deletions(-) diff --git a/session-tests/example8a/example8a.expected b/session-tests/example8a/example8a.expected index 407663f..501d32d 100644 --- a/session-tests/example8a/example8a.expected +++ b/session-tests/example8a/example8a.expected @@ -1,12 +1,9 @@ -| test.c:16:17:16:22 | call to malloc | test.c:16:24:16:27 | size | 0 | test.c:19:5:19:17 | access to array | test.c:19:9:19:12 | size | -1 | test.c:15:19:15:22 | size | test.c:15:19:15:22 | size | -| test.c:16:17:16:22 | call to malloc | test.c:16:24:16:27 | size | 0 | test.c:21:5:21:13 | access to array | test.c:21:9:21:12 | size | 0 | test.c:15:19:15:22 | size | test.c:15:19:15:22 | size | -| test.c:28:17:28:22 | call to malloc | test.c:28:24:28:27 | size | 0 | test.c:37:5:37:17 | access to array | test.c:37:9:37:12 | size | -1 | test.c:26:19:26:22 | size | test.c:26:19:26:22 | size | -| test.c:28:17:28:22 | call to malloc | test.c:28:24:28:27 | size | 0 | test.c:39:5:39:13 | access to array | test.c:39:9:39:12 | size | 0 | test.c:26:19:26:22 | size | test.c:26:19:26:22 | size | -| test.c:28:17:28:22 | call to malloc | test.c:28:24:28:27 | size | 0 | test.c:43:9:43:17 | access to array | test.c:43:13:43:16 | size | 0 | test.c:26:19:26:22 | size | test.c:26:19:26:22 | size | -| test.c:28:17:28:22 | call to malloc | test.c:28:24:28:27 | size | 0 | test.c:44:9:44:21 | access to array | test.c:44:13:44:16 | size | 1 | test.c:26:19:26:22 | size | test.c:26:19:26:22 | size | -| test.c:28:17:28:22 | call to malloc | test.c:28:24:28:27 | size | 0 | test.c:45:9:45:21 | access to array | test.c:45:13:45:16 | size | 2 | test.c:26:19:26:22 | size | test.c:26:19:26:22 | size | -| test.c:63:17:63:22 | call to malloc | test.c:63:24:63:33 | alloc_size | 0 | test.c:68:5:68:23 | access to array | test.c:68:9:68:18 | alloc_size | -1 | test.c:51:19:51:28 | alloc_size | test.c:51:19:51:28 | alloc_size | -| test.c:63:17:63:22 | call to malloc | test.c:63:24:63:33 | alloc_size | 0 | test.c:69:5:69:19 | access to array | test.c:69:9:69:18 | alloc_size | 0 | test.c:51:19:51:28 | alloc_size | test.c:51:19:51:28 | alloc_size | -| test.c:63:17:63:22 | call to malloc | test.c:63:24:63:33 | alloc_size | 0 | test.c:73:9:73:23 | access to array | test.c:73:13:73:22 | alloc_size | 0 | test.c:51:19:51:28 | alloc_size | test.c:51:19:51:28 | alloc_size | -| test.c:63:17:63:22 | call to malloc | test.c:63:24:63:33 | alloc_size | 0 | test.c:74:9:74:27 | access to array | test.c:74:13:74:22 | alloc_size | 1 | test.c:51:19:51:28 | alloc_size | test.c:51:19:51:28 | alloc_size | -| test.c:63:17:63:22 | call to malloc | test.c:63:24:63:33 | alloc_size | 0 | test.c:75:9:75:27 | access to array | test.c:75:13:75:22 | alloc_size | 2 | test.c:51:19:51:28 | alloc_size | test.c:51:19:51:28 | alloc_size | +| test.c:16:17:16:22 | call to malloc | test.c:16:24:16:27 | size | test.c:21:5:21:13 | access to array | test.c:21:9:21:12 | size | test.c:15:19:15:22 | size | 0 | test.c:15:19:15:22 | size | 0 | +| test.c:28:17:28:22 | call to malloc | test.c:28:24:28:27 | size | test.c:39:5:39:13 | access to array | test.c:39:9:39:12 | size | test.c:26:19:26:22 | size | 0 | test.c:26:19:26:22 | size | 0 | +| test.c:28:17:28:22 | call to malloc | test.c:28:24:28:27 | size | test.c:43:9:43:17 | access to array | test.c:43:13:43:16 | size | test.c:26:19:26:22 | size | 0 | test.c:26:19:26:22 | size | 0 | +| test.c:28:17:28:22 | call to malloc | test.c:28:24:28:27 | size | test.c:44:9:44:21 | access to array | test.c:44:13:44:16 | size | test.c:26:19:26:22 | size | 0 | test.c:26:19:26:22 | size | 1 | +| test.c:28:17:28:22 | call to malloc | test.c:28:24:28:27 | size | test.c:45:9:45:21 | access to array | test.c:45:13:45:16 | size | test.c:26:19:26:22 | size | 0 | test.c:26:19:26:22 | size | 2 | +| test.c:63:17:63:22 | call to malloc | test.c:63:24:63:33 | alloc_size | test.c:69:5:69:19 | access to array | test.c:69:9:69:18 | alloc_size | test.c:51:19:51:28 | alloc_size | 0 | test.c:51:19:51:28 | alloc_size | 0 | +| test.c:63:17:63:22 | call to malloc | test.c:63:24:63:33 | alloc_size | test.c:73:9:73:23 | access to array | test.c:73:13:73:22 | alloc_size | test.c:51:19:51:28 | alloc_size | 0 | test.c:51:19:51:28 | alloc_size | 0 | +| test.c:63:17:63:22 | call to malloc | test.c:63:24:63:33 | alloc_size | test.c:74:9:74:27 | access to array | test.c:74:13:74:22 | alloc_size | test.c:51:19:51:28 | alloc_size | 0 | test.c:51:19:51:28 | alloc_size | 1 | +| test.c:63:17:63:22 | call to malloc | test.c:63:24:63:33 | alloc_size | test.c:75:9:75:27 | access to array | test.c:75:13:75:22 | alloc_size | test.c:51:19:51:28 | alloc_size | 0 | test.c:51:19:51:28 | alloc_size | 2 | diff --git a/session/example8a.ql b/session/example8a.ql index 90a690f..746cde2 100644 --- a/session/example8a.ql +++ b/session/example8a.ql @@ -29,16 +29,20 @@ where // Same initializer variable bufferBase.(VariableAccess).getTarget() = bufInit and accessBase.(VariableAccess).getTarget() = accessInit and - bufInit = accessInit -// +++ -// Identify questionable differences -select buffer, bufferBase, bufferOffset, access, accessBase, accessOffset, bufInit, accessInit + bufInit = accessInit and + // +++ + // Identify questionable differences + accessOffset >= bufferOffset +select buffer, bufferBase, access, accessBase, bufInit, bufferOffset, accessInit, accessOffset /** * Extract base and offset from y = base+offset and y = base-offset. For others, get y and 0. * * For cases like * buf[alloc_size + 1]; + * ^^^^^^^^^^^^^^ expr + * ^^^^^^^^^^ base + * ^^^ offset * * The more general * buf[sz * x * y - 1]; diff --git a/session/session.org b/session/session.org index 19540d0..30f89a6 100644 --- a/session/session.org +++ b/session/session.org @@ -446,19 +446,6 @@ To address these, take the query from the previous exercise and - This only handles very specific cases. Constructing counterexamples is easy. - We will address this in the next section. -*** TODO incoporate - #+BEGIN_SRC java - predicate isOffsetOutOfBoundsConstant( - ArrayExpr access, FunctionCall source, int allocSize, int accessOffset - ) { - ensureSameFunction(access, source) and - // allocatedBufferArrayAccess(access, source) and - allocSize = getMaxStatedValue(source.getArgument(0)) and - accessOffset = getFixedArrayOffset(access) and - accessOffset >= allocSize - } - #+END_SRC - *** Solution: #+INCLUDE: "example8.ql" src java @@ -482,9 +469,25 @@ To address these, take the query from the previous exercise and constants that flow to a given expression. Another approach is global value numbering, used next. +** Step 8a + Find problematic accesses by reverting to some /simple/ =var+const= checks using + =accessOffset= and =bufferOffset=. + + Note: + - These will flag some false positives. + - The product expression =sz * x * y= is not easily checked for equality. + These are addressed in the next step. + +*** Solution: + #+INCLUDE: "example8a.ql" src java + +*** First 5 results + #+INCLUDE: "../session-tests/example8a/example8a.expected" :lines "-6"’ + ** Step 9 -- Global Value Numbering - Range analyis won't bound =sz * x * y=, so switch to global value - numbering. + Range analyis won't bound =sz * x * y=, and simpl equality checks don't work at + the structure level, so switch to global value numbering. + This is the case in the last test case, #+begin_example void test_gvn_var(unsigned long x, unsigned long y, unsigned long sz) @@ -524,15 +527,6 @@ To address these, take the query from the previous exercise and #+end_example we have to "evaluate" the expressions -- or at least bound them. -*** DONE incorporate - Done by =ensureSameFunction= instead. - #+BEGIN_SRC java - predicate allocatedBufferArrayAccess(ArrayExpr access, FunctionCall alloc) { - alloc.getTarget().hasName("malloc") and - DataFlow::localExprFlow(alloc, access.getArrayBase()) - } - #+END_SRC - *** TODO incoporate #+BEGIN_SRC java int getFixedArrayOffset(ArrayExpr access) { From ea064e5fde96eea7e39a3a57f01720ab69e26730 Mon Sep 17 00:00:00 2001 From: Michael Hohn Date: Tue, 23 May 2023 12:30:50 -0700 Subject: [PATCH 24/28] More on step 9, global value numbering --- session-tests/Example9/example9.expected | 17 +- session/example8a.ql | 2 +- session/example9.ql | 44 +- session/session.md | 847 ++++++++++++----------- session/session.org | 141 +--- 5 files changed, 499 insertions(+), 552 deletions(-) diff --git a/session-tests/Example9/example9.expected b/session-tests/Example9/example9.expected index 2936e3d..c7fec59 100644 --- a/session-tests/Example9/example9.expected +++ b/session-tests/Example9/example9.expected @@ -1,8 +1,9 @@ -| test.c:21:5:21:13 | access to array | test.c:15:26:15:28 | GVN | test.c:15:26:15:28 | GVN | -| test.c:38:5:38:12 | access to array | test.c:26:39:26:41 | GVN | test.c:26:39:26:41 | GVN | -| test.c:69:5:69:19 | access to array | test.c:63:24:63:33 | GVN | test.c:63:24:63:33 | GVN | -| test.c:73:9:73:23 | access to array | test.c:63:24:63:33 | GVN | test.c:63:24:63:33 | GVN | -| test.c:74:9:74:27 | access to array | test.c:74:13:74:26 | GVN | test.c:63:24:63:33 | GVN | -| test.c:75:9:75:27 | access to array | test.c:75:13:75:26 | GVN | test.c:63:24:63:33 | GVN | -| test.c:83:5:83:19 | access to array | test.c:81:24:81:33 | GVN | test.c:81:24:81:33 | GVN | -| test.c:84:5:84:23 | access to array | test.c:84:9:84:22 | GVN | test.c:81:24:81:33 | GVN | +| test.c:21:5:21:13 | access to array | test.c:15:26:15:28 | GVN | test.c:15:26:15:28 | 100 | test.c:16:24:16:27 | size | test.c:15:26:15:28 | GVN | test.c:21:9:21:12 | size | 0 | +| test.c:21:5:21:13 | access to array | test.c:15:26:15:28 | GVN | test.c:16:24:16:27 | size | test.c:16:24:16:27 | size | test.c:15:26:15:28 | GVN | test.c:21:9:21:12 | size | 0 | +| test.c:38:5:38:12 | access to array | test.c:26:39:26:41 | GVN | test.c:26:39:26:41 | 100 | test.c:28:24:28:27 | size | test.c:26:39:26:41 | GVN | test.c:38:9:38:11 | 100 | 0 | +| test.c:69:5:69:19 | access to array | test.c:63:24:63:33 | GVN | test.c:63:24:63:33 | alloc_size | test.c:63:24:63:33 | alloc_size | test.c:63:24:63:33 | GVN | test.c:69:9:69:18 | alloc_size | 0 | +| test.c:73:9:73:23 | access to array | test.c:63:24:63:33 | GVN | test.c:63:24:63:33 | alloc_size | test.c:63:24:63:33 | alloc_size | test.c:63:24:63:33 | GVN | test.c:73:13:73:22 | alloc_size | 0 | +| test.c:74:9:74:27 | access to array | test.c:63:24:63:33 | GVN | test.c:63:24:63:33 | alloc_size | test.c:63:24:63:33 | alloc_size | test.c:74:13:74:26 | GVN | test.c:74:13:74:26 | ... + ... | 1 | +| test.c:75:9:75:27 | access to array | test.c:63:24:63:33 | GVN | test.c:63:24:63:33 | alloc_size | test.c:63:24:63:33 | alloc_size | test.c:75:13:75:26 | GVN | test.c:75:13:75:26 | ... + ... | 2 | +| test.c:83:5:83:19 | access to array | test.c:81:24:81:33 | GVN | test.c:81:24:81:33 | ... * ... | test.c:81:24:81:33 | ... * ... | test.c:81:24:81:33 | GVN | test.c:83:9:83:18 | ... * ... | 0 | +| test.c:84:5:84:23 | access to array | test.c:81:24:81:33 | GVN | test.c:81:24:81:33 | ... * ... | test.c:81:24:81:33 | ... * ... | test.c:84:9:84:22 | GVN | test.c:84:9:84:22 | ... + ... | 1 | diff --git a/session/example8a.ql b/session/example8a.ql index 746cde2..cdd46aa 100644 --- a/session/example8a.ql +++ b/session/example8a.ql @@ -18,7 +18,7 @@ where bufferSizeExpr = buffer.getSizeExpr() and // Ensure buffer access refers to the matching allocation DataFlow::localExprFlow(buffer, access.getArrayBase()) and - // Ensure buffer access refers to the matching allocation + // Find allocation size expression flowing to buffer. DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and // // +++ diff --git a/session/example9.ql b/session/example9.ql index 382ea0b..0d7ceba 100644 --- a/session/example9.ql +++ b/session/example9.ql @@ -1,49 +1,41 @@ import cpp import semmle.code.cpp.dataflow.DataFlow -import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis import semmle.code.cpp.valuenumbering.GlobalValueNumbering -// Step 9 from - AllocationExpr buffer, ArrayExpr access, Expr accessIdx, Expr allocSizeExpr, GVN gvnAccess, - GVN gvnAlloc + AllocationExpr buffer, ArrayExpr access, + // --- + // Expr bufferSizeExpr + // int accessOffset, Expr accessBase, Expr bufferBase, int bufferOffset, Variable bufInit, + // +++ + Expr allocSizeExpr, Expr accessIdx, GVN gvnAccessIdx, GVN gvnAllocSizeExpr, int accessOffset where // malloc (100) // ^^^^^^^^^^^^ AllocationExpr buffer - // // buf[...] // ^^^ ArrayExpr access // buf[...] // ^^^ accessIdx accessIdx = access.getArrayOffset() and - // - // malloc (100) - // ^^^ allocSizeExpr / bufferSize - // unsigned long size = 100; - // ... - // char *buf = malloc(size); + // Find allocation size expression flowing to the allocation. DataFlow::localExprFlow(allocSizeExpr, buffer.getSizeExpr()) and - // char *buf = ... buf[0]; - // ^^^ ---> ^^^ - // or - // malloc(100); buf[0] - // ^^^ --------> ^^^ - // + // Ensure buffer access refers to the matching allocation DataFlow::localExprFlow(buffer, access.getArrayBase()) and - // // Use GVN - globalValueNumber(accessIdx) = gvnAccess and - globalValueNumber(allocSizeExpr) = gvnAlloc and + globalValueNumber(accessIdx) = gvnAccessIdx and + globalValueNumber(allocSizeExpr) = gvnAllocSizeExpr and ( - gvnAccess = gvnAlloc + // buf[size] or buf[100] + gvnAccessIdx = gvnAllocSizeExpr and + accessOffset = 0 or - // buf[sz * x * y] above // buf[sz * x * y + 1]; exists(AddExpr add | accessIdx = add and - // add.getAnOperand() = accessIdx and - add.getAnOperand().getValue().toInt() > 0 and - globalValueNumber(add.getAnOperand()) = gvnAlloc + accessOffset >= 0 and + accessOffset = add.getRightOperand().(Literal).getValue().toInt() and + globalValueNumber(add.getLeftOperand()) = gvnAllocSizeExpr ) ) -select access, gvnAccess, gvnAlloc +select access, gvnAllocSizeExpr, allocSizeExpr, buffer.getSizeExpr() as allocArg, gvnAccessIdx, + accessIdx, accessOffset diff --git a/session/session.md b/session/session.md index e66a0df..c89bd0e 100644 --- a/session/session.md +++ b/session/session.md @@ -6,48 +6,46 @@ - [Session/Workshop notes](#sessionworkshop-notes) - [Step 1](#exercise-1) - [Hints](#hints) - - [Solution](#org2fc84f1) - - [Step 2](#org81cc6bb) + - [Solution](#orgf0fce94) + - [Step 2](#orgebf05ab) - [Hints](#hints) - - [Solution](#orgd82dd3e) - - [Results](#orgb9c5185) + - [Solution](#org0096fb0) + - [Results](#org8fddea8) - [Step 3](#exercise-2) - - [Solution](#orgbfcc6e8) - - [Results](#org77d93e8) - - [Step 4](#orgcac2df5) - - [Hint](#orge73d8c3) - - [Solution](#orgcce7aa5) - - [Results](#org58fd83d) - - [Step 4a – some clean-up using predicates](#org1a3052f) - - [Solution](#orgf922609) - - [Step 5 – SimpleRangeAnalysis](#org0df2f23) - - [Solution](#orgb23c26e) - - [First 5 results](#org921d64a) - - [Step 6](#org2b0d3ac) - - [Solution](#orgdd0881f) - - [First 5 results](#org3e7d47c) - - [Step 7](#org00edfe5) - - [Solution:](#org8a3a4b1) - - [First 5 results](#org2f15e3e) - - [Step 7a](#orgfa97dcd) - - [Solution:](#org3894df3) - - [First 5 results](#orgcbdf216) - - [Step 7b](#org58aba89) - - [incoporate](#orgd60c31b) - - [incoporate](#org3319200) - - [Step 8](#orgcfcb55c) - - [Solution:](#orgede6c66) - - [Results](#orgfa54b95) - - [Interim notes](#org68d8dfb) - - [Step 9 – Global Value Numbering](#orge1acc6c) - - [incorporate](#orgb19cfc3) - - [incorporate](#orgda109c3) - - [incoporate](#org8d0c13d) - - [incoporate](#org45411bb) - - [incoporate](#org364861b) - - [interim](#orgc0ae12b) - - [interim](#orgb2f39ee) - - [hashconsing](#org6332f3e) + - [Solution](#org3698b58) + - [Results](#org00c9191) + - [Step 4](#org5cda1ae) + - [Hint](#orge721980) + - [Solution](#orgc515483) + - [Results](#org57d3bda) + - [Step 4a – some clean-up using predicates](#org0982d85) + - [Solution](#orgb26e7e5) + - [Step 5 – SimpleRangeAnalysis](#orgb82a277) + - [Solution](#orgad1e7b0) + - [First 5 results](#org2740a32) + - [Step 6](#orgd745b01) + - [Solution](#org8d5d426) + - [First 5 results](#orgebf1dcb) + - [Step 7](#org815886d) + - [Solution](#orgfc8c990) + - [First 5 results](#org3b1615b) + - [Step 7a](#orgc92b420) + - [Solution](#org34ac413) + - [First 5 results](#org6ad3b1e) + - [Step 7b](#org9cc9de8) + - [Solution](#orge60a202) + - [First 5 results](#orgcb20f8d) + - [Step 8](#org2bd04bc) + - [Solution](#org06f95fe) + - [First 5 results](#org46e475c) + - [Interim notes](#orgef00db2) + - [Step 8a](#org3d75997) + - [Solution](#orgf75a887) + - [First 5 results](#orgb242afe) + - [Step 9 – Global Value Numbering](#org0abea3c) + - [Solution](#orga707c2b) + - [First 5 results](#orgd058b22) + - [hashconsing](#org2e8a639) @@ -165,9 +163,9 @@ To find these issues, 1. We can implement an analysis that tracks the upper or lower bounds on an expression. 2. We then combine this with data-flow analysis to reduce false positives and identify cases where the index of the array results in an access beyond the allocated size of the buffer. 3. We further extend these queries with rudimentary arithmetic support involving expressions common to the allocation and the array access. -4. For cases where constant expressions are not available or are uncertain, we first try [range analysis](#org0df2f23) to expand the query's applicability. -5. For cases where this is insufficient, we introduce global value numbering [GVN](https://codeql.github.com/docs/codeql-language-guides/hash-consing-and-value-numbering) in [Step 9 – Global Value Numbering](#orge1acc6c), to detect values known to be equal at runtime. -6. When *those* cases are insufficient, we handle the case of identical structure using [hashconsing](#org6332f3e). +4. For cases where constant expressions are not available or are uncertain, we first try [range analysis](#orgb82a277) to expand the query's applicability. +5. For cases where this is insufficient, we introduce global value numbering [GVN](https://codeql.github.com/docs/codeql-language-guides/hash-consing-and-value-numbering) in [Step 9 – Global Value Numbering](#org0abea3c), to detect values known to be equal at runtime. +6. When *those* cases are insufficient, we handle the case of identical structure using [hashconsing](#org2e8a639). @@ -199,7 +197,7 @@ in [db.c](file:///Users/hohn/local/codeql-workshop-runtime-values-c/session-db/D 1. `Expr::getValue()::toInt()` can be used to get the integer value of a constant expression. - + ### Solution @@ -231,7 +229,7 @@ select buffer, access, accessIdx, access.getArrayOffset(), bufferSize, allocSize This produces 12 results, with some cross-function pairs. - + ## Step 2 @@ -251,7 +249,7 @@ To address these, take the query from the previous exercise and 2. The the array base is the `buf` part of `buf[0]`. Use the `Expr.getArrayBase()` predicate. - + ### Solution @@ -281,7 +279,7 @@ where allocSizeExpr = buffer.(Call).getArgument(0) and bufferSize = allocSizeExpr.getValue().toInt() and // - // Ensure alloc and buffer access are in the same function + // Ensure buffer access is to the correct allocation. // char *buf = ... buf[0]; // ^^^ ---> ^^^ // or @@ -293,7 +291,7 @@ select buffer, access, accessIdx, access.getArrayOffset(), bufferSize, allocSize ``` - + ### Results @@ -321,7 +319,7 @@ Here, the `malloc` argument is a variable with known value. We include this result by removing the size-retrieval from the prior query. - + ### Solution @@ -350,7 +348,7 @@ where allocSizeExpr = buffer.(Call).getArgument(0) and // bufferSize = allocSizeExpr.getValue().toInt() and // - // Ensure alloc and buffer access are in the same function + // Ensure buffer access is to the correct allocation. // char *buf = ... buf[0]; // ^^^ ---> ^^^ // or @@ -362,14 +360,14 @@ select buffer, access, accessIdx, access.getArrayOffset() ``` - + ### Results Now, we get 12 results, including some from other test cases. - + ## Step 4 @@ -382,12 +380,12 @@ Note the results for the cases in `test_const_var` which involve a variable acce We have an expression `size` that flows into the `malloc()` call. - + ### Hint - + ### Solution @@ -423,7 +421,7 @@ where bufferSizeExpr.getValue().toInt() = bufferSize and bse = bufferSizeExpr ) and - // Ensure alloc and buffer access are in the same function + // Ensure buffer access is to the correct allocation. // char *buf = ... buf[0]; // ^^^ ---> ^^^ // or @@ -435,14 +433,14 @@ select buffer, access, accessIdx, access.getArrayOffset(), bufferSize, bse ``` - + ### Results Now, we get 15 results, limited to statically determined values. - + ## Step 4a – some clean-up using predicates @@ -470,7 +468,7 @@ Also, simplify the `from...where...select`: 2. Use `DataFlow::localExprFlow` for the buffer and allocation sizes, with `getValue().toInt()` as one possibility (one predicate). - + ### Solution @@ -491,16 +489,15 @@ where // accessIdx = access.getArrayOffset().getValue().toInt() and getAllocConstantExpr(bufferSizeExpr, bufferSize) and - // Ensure alloc and buffer access are in the same function - ensureSameFunction(buffer, access.getArrayBase()) and - // Ensure size defintion and use are in same function, even for non-constant expressions. - ensureSameFunction(bufferSizeExpr, buffer.getSizeExpr()) -// + // Ensure buffer access refers to the matching allocation + // ensureSameFunction(buffer, access.getArrayBase()) and + DataFlow::localExprFlow(buffer, access.getArrayBase()) and + // Ensure buffer access refers to the matching allocation + // ensureSameFunction(bufferSizeExpr, buffer.getSizeExpr()) and + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) + // select buffer, access, accessIdx, access.getArrayOffset(), bufferSize, bufferSizeExpr -/** Ensure the two expressions are in the same function body. */ -predicate ensureSameFunction(Expr a, Expr b) { DataFlow::localExprFlow(a, b) } - /** * Gets an expression that flows to the allocation (which includes those already in the allocation) * and has a constant value. @@ -524,7 +521,7 @@ predicate getAllocConstantExpr(Expr bufferSizeExpr, int bufferSize) { ``` - + ## Step 5 – SimpleRangeAnalysis @@ -551,7 +548,7 @@ Notes: select bufferSizeExpr, buffer, access, accessIdx, upperBound(accessIdx) as accessMax - + ### Solution @@ -578,16 +575,14 @@ where // ^^^ allocSizeExpr / bufferSize // getAllocConstantExpr(bufferSizeExpr, bufferSize) and - // Ensure alloc and buffer access are in the same function - ensureSameFunction(buffer, access.getArrayBase()) and - // Ensure size defintion and use are in same function, even for non-constant expressions. - ensureSameFunction(bufferSizeExpr, buffer.getSizeExpr()) -// + // Ensure buffer access is to the correct allocation. + DataFlow::localExprFlow(buffer, access.getArrayBase()) and + // Ensure use refers to the correct size defintion, even for non-constant + // expressions. + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) + // select bufferSizeExpr, buffer, access, accessIdx, upperBound(accessIdx) as accessMax -/** Ensure the two expressions are in the same function body. */ -predicate ensureSameFunction(Expr a, Expr b) { DataFlow::localExprFlow(a, b) } - /** * Gets an expression that flows to the allocation (which includes those already in the allocation) * and has a constant value. @@ -611,7 +606,7 @@ predicate getAllocConstantExpr(Expr bufferSizeExpr, int bufferSize) { ``` - + ### First 5 results @@ -622,7 +617,7 @@ predicate getAllocConstantExpr(Expr bufferSizeExpr, int bufferSize) { | test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:18:5:18:11 | access to array | test.c:18:9:18:10 | 99 | 99.0 | - + ## Step 6 @@ -642,7 +637,7 @@ Hints: 3. These test cases all use type `char`. What would happen for `int` or `double`? - + ### Solution @@ -668,19 +663,17 @@ where // ^^^ allocSizeExpr / bufferSize // getAllocConstantExpr(bufferSizeExpr, bufferSize) and - // Ensure alloc and buffer access are in the same function - ensureSameFunction(buffer, access.getArrayBase()) and - // Ensure size defintion and use are in same function, even for non-constant expressions. - ensureSameFunction(bufferSizeExpr, buffer.getSizeExpr()) -// + // Ensure buffer access is to the correct allocation. + DataFlow::localExprFlow(buffer, access.getArrayBase()) and + // Ensure use refers to the correct size defintion, even for non-constant + // expressions. + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) + // select bufferSizeExpr, buffer, access, accessIdx, upperBound(accessIdx) as accessMax, bufferSize, access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType() as arrayBaseType, access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() as arrayTypeSize, 1 as allocBaseSize -/** Ensure the two expressions are in the same function body. */ -predicate ensureSameFunction(Expr a, Expr b) { DataFlow::localExprFlow(a, b) } - /** * Gets an expression that flows to the allocation (which includes those already in the allocation) * and has a constant value. @@ -704,7 +697,7 @@ predicate getAllocConstantExpr(Expr bufferSizeExpr, int bufferSize) { ``` - + ### First 5 results @@ -715,7 +708,7 @@ predicate getAllocConstantExpr(Expr bufferSizeExpr, int bufferSize) { | test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:18:5:18:11 | access to array | test.c:18:9:18:10 | 99 | 99.0 | 100 | | char | 1 | 1 | - + ## Step 7 @@ -727,9 +720,9 @@ predicate getAllocConstantExpr(Expr bufferSizeExpr, int bufferSize) { 3. Compare them - + -### Solution: +### Solution ```java import cpp @@ -752,10 +745,11 @@ where // ^^^ allocSizeExpr / bufferSize // getAllocConstantExpr(bufferSizeExpr, bufferSize) and - // Ensure alloc and buffer access are in the same function - ensureSameFunction(buffer, access.getArrayBase()) and - // Ensure size defintion and use are in same function, even for non-constant expressions. - ensureSameFunction(bufferSizeExpr, buffer.getSizeExpr()) and + // Ensure buffer access is to the correct allocation. + DataFlow::localExprFlow(buffer, access.getArrayBase()) and + // Ensure use refers to the correct size defintion, even for non-constant + // expressions. + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and // arrayTypeSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() and 1 = allocBaseSize @@ -764,9 +758,6 @@ select bufferSizeExpr, buffer, access, accessIdx, upperBound(accessIdx) as acces access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType() as arrayBaseType, allocBaseSize * bufferSize as allocatedUnits, arrayTypeSize * accessMax as maxAccessedIndex -/** Ensure the two expressions are in the same function body. */ -predicate ensureSameFunction(Expr a, Expr b) { DataFlow::localExprFlow(a, b) } - /** * Gets an expression that flows to the allocation (which includes those already in the allocation) * and has a constant value. @@ -786,7 +777,7 @@ predicate getAllocConstantExpr(Expr bufferSizeExpr, int bufferSize) { ``` - + ### First 5 results @@ -797,7 +788,7 @@ predicate getAllocConstantExpr(Expr bufferSizeExpr, int bufferSize) { | test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:18:5:18:11 | access to array | test.c:18:9:18:10 | 99 | 99.0 | 100 | | char | 100 | 99.0 | - + ## Step 7a @@ -805,9 +796,9 @@ predicate getAllocConstantExpr(Expr bufferSizeExpr, int bufferSize) { 2. Put all expressions into the select for review. - + -### Solution: +### Solution ```java import cpp @@ -825,10 +816,11 @@ where // ^^^ int accessIdx accessIdx = access.getArrayOffset() and getAllocConstantExpr(bufferSizeExpr, bufferSize) and - // Ensure alloc and buffer access are in the same function - ensureSameFunction(buffer, access.getArrayBase()) and - // Ensure size defintion and use are in same function, even for non-constant expressions. - ensureSameFunction(bufferSizeExpr, buffer.getSizeExpr()) and + // Ensure buffer access is to the correct allocation. + DataFlow::localExprFlow(buffer, access.getArrayBase()) and + // Ensure use refers to the correct size defintion, even for non-constant + // expressions. + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and // arrayTypeSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() and 1 = allocBaseSize @@ -839,9 +831,6 @@ select bufferSizeExpr, buffer, access, accessIdx, upperBound(accessIdx) as acces arrayBaseType.getSize() as arrayBaseTypeSize, allocBaseSize * bufferSize as allocatedUnits, arrayTypeSize * accessMax as maxAccessedIndex -/** Ensure the two expressions are in the same function body. */ -predicate ensureSameFunction(Expr a, Expr b) { DataFlow::localExprFlow(a, b) } - /** * Gets an expression that flows to the allocation (which includes those already in the allocation) * and has a constant value. @@ -861,7 +850,7 @@ predicate getAllocConstantExpr(Expr bufferSizeExpr, int bufferSize) { ``` - + ### First 5 results @@ -872,148 +861,272 @@ predicate getAllocConstantExpr(Expr bufferSizeExpr, int bufferSize) { | test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:18:5:18:11 | access to array | test.c:18:9:18:10 | 99 | 99.0 | 100 | | char | 1 | 1 | 100 | 99.0 | - + ## Step 7b -Introduce more general predicates. - -1. Move these into a single predicate, `isOffsetOutOfBoundsConstant` +1. Introduce more general predicates. +2. Compare buffer allocation size to the access index. +3. Report only the questionable entries. - + -### TODO incoporate +### Solution ```java +import cpp +import semmle.code.cpp.dataflow.DataFlow +import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis + +from + AllocationExpr buffer, ArrayExpr access, int bufferSize, Expr bufferSizeExpr, + int maxAccessedIndex, int allocatedUnits +where + // malloc (100) + // ^^^^^^^^^^^^ AllocationExpr buffer + getAllocConstantExpr(bufferSizeExpr, bufferSize) and + // Ensure buffer access is to the correct allocation. + DataFlow::localExprFlow(buffer, access.getArrayBase()) and + // Ensure use refers to the correct size defintion, even for non-constant + // expressions. + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + // computeIndices(access, buffer, bufferSize, allocatedUnits, maxAccessedIndex) + computeAllocationSize(buffer, bufferSize, allocatedUnits) and + computeMaxAccess(access, maxAccessedIndex) + // only consider out-of-bounds + and + maxAccessedIndex >= allocatedUnits +select access, + "Array access at or beyond size; have " + allocatedUnits + " units, access at " + maxAccessedIndex + +// select bufferSizeExpr, buffer, access, allocatedUnits, maxAccessedIndex + /** - * Gets the smallest of the upper bound of `e` or the largest source value - * (i.e. "stated value") that flows to `e`. Because range-analysis can over-widen - * bounds, take the minimum of range analysis and data-flow sources. - * - * If there is no source value that flows to `e`, this predicate does not hold. - * - * This predicate, if `e` is the `sz` arg to `malloc`, would return `20` for the - * following: - * - * size_t sz = condition ? 10 : 20; - * malloc(sz); - * + * Compute the maximum accessed index. */ -bindingset[e] -int getMaxStatedValue(Expr e) { - result = upperBound(e).minimum(max(getSourceConstantExpr(e).getValue().toInt())) +predicate computeMaxAccess(ArrayExpr access, int maxAccessedIndex) { + exists( + int arrayTypeSize, int accessMax, Type arrayBaseType, int arrayBaseTypeSize, Expr accessIdx + | + // buf[...] + // ^^^^^^^^ ArrayExpr access + // ^^^ + accessIdx = access.getArrayOffset() and + upperBound(accessIdx) = accessMax and + arrayBaseType.getSize() = arrayBaseTypeSize and + access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType() = arrayBaseType and + arrayTypeSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() and + arrayTypeSize * accessMax = maxAccessedIndex + ) } -``` - - - -### TODO incoporate +/** + * Compute the allocation size. + */ +bindingset[bufferSize] +predicate computeAllocationSize(AllocationExpr buffer, int bufferSize, int allocatedUnits) { + exists(int bufferBaseTypeSize, Type arrayBaseType, int arrayBaseTypeSize | + // buf[...] + // ^^^^^^^^ ArrayExpr access + // ^^^ + buffer.getSizeMult() = bufferBaseTypeSize and + arrayBaseType.getSize() = arrayBaseTypeSize and + bufferSize * bufferBaseTypeSize = allocatedUnits + ) +} -```java -predicate isOffsetOutOfBoundsConstant( - ArrayExpr access, FunctionCall source, int allocSize, int accessOffset +/** + * Compute the allocation size and the maximum accessed index for the allocation and access. + */ +bindingset[bufferSize] +predicate computeIndices( + ArrayExpr access, AllocationExpr buffer, int bufferSize, int allocatedUnits, int maxAccessedIndex ) { - ensureSameFunction(access, source) and - // allocatedBufferArrayAccess(access, source) and - allocSize = getMaxStatedValue(source.getArgument(0)) and - accessOffset = getFixedArrayOffset(access) and - accessOffset >= allocSize + exists( + int arrayTypeSize, int accessMax, int bufferBaseTypeSize, Type arrayBaseType, + int arrayBaseTypeSize, Expr accessIdx + | + // buf[...] + // ^^^^^^^^ ArrayExpr access + // ^^^ + accessIdx = access.getArrayOffset() and + upperBound(accessIdx) = accessMax and + buffer.getSizeMult() = bufferBaseTypeSize and + arrayBaseType.getSize() = arrayBaseTypeSize and + bufferSize * bufferBaseTypeSize = allocatedUnits and + access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType() = arrayBaseType and + arrayTypeSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() and + arrayTypeSize * accessMax = maxAccessedIndex + ) +} + +/** + * Gets an expression that flows to the allocation (which includes those already in the allocation) + * and has a constant value. + */ +predicate getAllocConstantExpr(Expr bufferSizeExpr, int bufferSize) { + exists(AllocationExpr buffer | + // Capture BOTH with datflow: + // 1. malloc (100) + // ^^^ bufferSize + // 2. unsigned long size = 100; ... ; char *buf = malloc(size); + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + bufferSizeExpr.getValue().toInt() = bufferSize + ) } ``` - + + +### First 5 results + +WARNING: Unused predicate computeIndices (/Users/hohn/local/codeql-workshop-runtime-values-c/session/example7b.ql:66,11-25) + +| test.c:10:5:10:12 | access to array | Array access at or beyond size; have 100 units, access at 100 | +| test.c:20:5:20:12 | access to array | Array access at or beyond size; have 100 units, access at 100 | +| test.c:21:5:21:13 | access to array | Array access at or beyond size; have 100 units, access at 100 | +| test.c:37:5:37:17 | access to array | Array access at or beyond size; have 100 units, access at 299 | + + + ## Step 8 -1. Clean up the query. -2. Compare buffer allocation size to the access index. -3. Report only the questionable entries. -4. Use +Up to now, we have dealt with constant values + +```c++ +char *buf = malloc(100); +buf[0]; // COMPLIANT +``` + +or + +```c++ +unsigned long size = 100; +char *buf = malloc(size); +buf[0]; // COMPLIANT +``` + +and statically determinable or boundable values + +```c++ +char *buf = malloc(size); +if (size < 199) + { + buf[size]; // COMPLIANT + // ... + } +``` + +There is another statically determinable case. Examples are + +1. A simple expression - ```java - /** - * @kind problem - */ + ```c++ + char *buf = malloc(alloc_size); + // ... + buf[alloc_size - 1]; // COMPLIANT + buf[alloc_size]; // NON_COMPLIANT ``` +2. A complex expression - to get nicer reporting. + ```c++ + char *buf = malloc(sz * x * y); + buf[sz * x * y - 1]; // COMPLIANT + ``` +These both have the form `malloc(e)`, `buf[e+c]`, where `e` is an `Expr` and `c` is a constant, possibly 0. Our existing queries only report known or boundable results, but here `e` is neither. - +Write a new query, re-using or modifying the existing one to handle the simple expression (case 1). -### Solution: +Note: -```java -/** - * @kind problem - */ +- We are looking at the allocation expression again, not its possible value. +- This only handles very specific cases. Constructing counterexamples is easy. +- We will address this in the next section. + + + + +### Solution +```java import cpp import semmle.code.cpp.dataflow.DataFlow import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis -// Step 8 from - AllocationExpr buffer, ArrayExpr access, Expr accessIdx, Expr allocSizeExpr, int bufferSize, - int allocsize, Expr bufferSizeExpr, int arrayTypeSize, int allocBaseSize, int accessMax, - int allocatedUnits, int maxAccessedIndex + AllocationExpr buffer, ArrayExpr access, Expr bufferSizeExpr, + // --- + // int maxAccessedIndex, int allocatedUnits, + // int bufferSize + int accessOffset, Expr accessBase, Expr bufferBase, int bufferOffset, Variable bufInit, + Variable accessInit where - // malloc (100) + // malloc (...) // ^^^^^^^^^^^^ AllocationExpr buffer - // - // buf[...] - // ^^^ ArrayExpr access - // buf[...] - // ^^^ int accessIdx - accessIdx = access.getArrayOffset() and - // - // malloc (100) - // ^^^ allocSizeExpr / bufferSize - // - // Not really: - // allocSizeExpr = buffer.(Call).getArgument(0) and - // - DataFlow::localExprFlow(allocSizeExpr, buffer.(Call).getArgument(0)) and - allocsize = allocSizeExpr.getValue().toInt() and - // - // unsigned long size = 100; - // ... - // char *buf = malloc(size); + // --- + // getAllocConstExpr(...) + // +++ + bufferSizeExpr = buffer.getSizeExpr() and + // Ensure buffer access refers to the matching allocation + DataFlow::localExprFlow(buffer, access.getArrayBase()) and + // Ensure buffer access refers to the matching allocation DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and - bufferSizeExpr.getValue().toInt() = bufferSize and - // char *buf = ... buf[0]; - // ^^^ ---> ^^^ - // or - // malloc(100); buf[0] - // ^^^ --------> ^^^ // - arrayTypeSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() and - 1 = allocBaseSize and - DataFlow::localExprFlow(buffer, access.getArrayBase()) and - upperBound(accessIdx) = accessMax and - allocBaseSize * allocsize = allocatedUnits and - arrayTypeSize * accessMax = maxAccessedIndex and - // only consider out-of-bounds - maxAccessedIndex >= allocatedUnits -select access, "Array access at or beyond size; have "+allocatedUnits + " units, access at "+ maxAccessedIndex -``` + // +++ + // base+offset + extractBaseAndOffset(bufferSizeExpr, bufferBase, bufferOffset) and + extractBaseAndOffset(access.getArrayOffset(), accessBase, accessOffset) and + // +++ + // Same initializer variable + bufferBase.(VariableAccess).getTarget() = bufInit and + accessBase.(VariableAccess).getTarget() = accessInit and + bufInit = accessInit +// +++ +// Identify questionable differences +select buffer, bufferBase, bufferOffset, access, accessBase, accessOffset, bufInit, accessInit +/** + * Extract base and offset from y = base+offset and y = base-offset. For others, get y and 0. + * + * For cases like + * buf[alloc_size + 1]; + * + * The more general + * buf[sz * x * y - 1]; + * requires other tools. + */ +bindingset[expr] +predicate extractBaseAndOffset(Expr expr, Expr base, int offset) { + offset = expr.(AddExpr).getRightOperand().getValue().toInt() and + base = expr.(AddExpr).getLeftOperand() + or + offset = -expr.(SubExpr).getRightOperand().getValue().toInt() and + base = expr.(SubExpr).getLeftOperand() + or + not expr instanceof AddExpr and + not expr instanceof SubExpr and + base = expr and + offset = 0 +} +``` - -### Results + -14 results in the much cleaner table +### First 5 results -| | | -|------------------------------------------------------------- |--------- | -| Array access at or beyond size; have 200 units, access at 200 | db.c:67:5 | +| test.c:16:17:16:22 | call to malloc | test.c:16:24:16:27 | size | 0 | test.c:19:5:19:17 | access to array | test.c:19:9:19:12 | size | -1 | test.c:15:19:15:22 | size | test.c:15:19:15:22 | size | +| test.c:16:17:16:22 | call to malloc | test.c:16:24:16:27 | size | 0 | test.c:21:5:21:13 | access to array | test.c:21:9:21:12 | size | 0 | test.c:15:19:15:22 | size | test.c:15:19:15:22 | size | +| test.c:28:17:28:22 | call to malloc | test.c:28:24:28:27 | size | 0 | test.c:37:5:37:17 | access to array | test.c:37:9:37:12 | size | -1 | test.c:26:19:26:22 | size | test.c:26:19:26:22 | size | +| test.c:28:17:28:22 | call to malloc | test.c:28:24:28:27 | size | 0 | test.c:39:5:39:13 | access to array | test.c:39:9:39:12 | size | 0 | test.c:26:19:26:22 | size | test.c:26:19:26:22 | size | +| test.c:28:17:28:22 | call to malloc | test.c:28:24:28:27 | size | 0 | test.c:43:9:43:17 | access to array | test.c:43:13:43:16 | size | 0 | test.c:26:19:26:22 | size | test.c:26:19:26:22 | size | - + ## Interim notes @@ -1026,268 +1139,210 @@ int val = rand() ? rand() : 30; A similar case is present in the `test_const_branch` and `test_const_branch2` test-cases. In these cases, it is necessary to augment range analysis with data-flow and restrict the bounds to the upper or lower bound of computable constants that flow to a given expression. Another approach is global value numbering, used next. - + -## Step 9 – Global Value Numbering +## Step 8a -Range analyis won't bound `sz * x * y`, so switch to global value numbering. This is the case in the last test case, +Find problematic accesses by reverting to some *simple* `var+const` checks using `accessOffset` and `bufferOffset`. - void test_gvn_var(unsigned long x, unsigned long y, unsigned long sz) - { - char *buf = malloc(sz * x * y); - buf[sz * x * y - 1]; // COMPLIANT - buf[sz * x * y]; // NON_COMPLIANT - buf[sz * x * y + 1]; // NON_COMPLIANT - } - -Reference: - -Global value numbering only knows that runtime values are equal; they are not comparable (`<, >, <=` etc.), and the *actual* value is not known. - -Global value numbering finds expressions with the same known value, independent of structure. - -So, we look for and use *relative* values between allocation and use. - -The relevant CodeQL constructs are - -```java -import semmle.code.cpp.valuenumbering.GlobalValueNumbering -... -globalValueNumber(e) = globalValueNumber(sizeExpr) and -e != sizeExpr -... -``` - -We can use global value numbering to identify common values as first step, but for expressions like - - buf[sz * x * y - 1]; // COMPLIANT +Note: -we have to "evaluate" the expressions – or at least bound them. +- These will flag some false positives. +- The product expression `sz * x * y` is not easily checked for equality. +These are addressed in the next step. - -### DONE incorporate + -Done by `ensureSameFunction` instead. +### Solution ```java -predicate allocatedBufferArrayAccess(ArrayExpr access, FunctionCall alloc) { - alloc.getTarget().hasName("malloc") and - DataFlow::localExprFlow(alloc, access.getArrayBase()) -} -``` - - - +import cpp +import semmle.code.cpp.dataflow.DataFlow +import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis -### TODO incorporate +from + AllocationExpr buffer, ArrayExpr access, Expr bufferSizeExpr, + // --- + // int maxAccessedIndex, int allocatedUnits, + // int bufferSize + int accessOffset, Expr accessBase, Expr bufferBase, int bufferOffset, Variable bufInit, + Variable accessInit +where + // malloc (...) + // ^^^^^^^^^^^^ AllocationExpr buffer + // --- + // getAllocConstExpr(...) + // +++ + bufferSizeExpr = buffer.getSizeExpr() and + // Ensure buffer access refers to the matching allocation + DataFlow::localExprFlow(buffer, access.getArrayBase()) and + // Find allocation size expression flowing to buffer. + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + // + // +++ + // base+offset + extractBaseAndOffset(bufferSizeExpr, bufferBase, bufferOffset) and + extractBaseAndOffset(access.getArrayOffset(), accessBase, accessOffset) and + // +++ + // Same initializer variable + bufferBase.(VariableAccess).getTarget() = bufInit and + accessBase.(VariableAccess).getTarget() = accessInit and + bufInit = accessInit and + // +++ + // Identify questionable differences + accessOffset >= bufferOffset +select buffer, bufferBase, access, accessBase, bufInit, bufferOffset, accessInit, accessOffset -```java +/** + * Extract base and offset from y = base+offset and y = base-offset. For others, get y and 0. + * + * For cases like + * buf[alloc_size + 1]; + * ^^^^^^^^^^^^^^ expr + * ^^^^^^^^^^ base + * ^^^ offset + * + * The more general + * buf[sz * x * y - 1]; + * requires other tools. + */ bindingset[expr] -int getExprOffsetValue(Expr expr, Expr base) { - result = expr.(AddExpr).getRightOperand().getValue().toInt() and +predicate extractBaseAndOffset(Expr expr, Expr base, int offset) { + offset = expr.(AddExpr).getRightOperand().getValue().toInt() and base = expr.(AddExpr).getLeftOperand() or - result = -expr.(SubExpr).getRightOperand().getValue().toInt() and + offset = -expr.(SubExpr).getRightOperand().getValue().toInt() and base = expr.(SubExpr).getLeftOperand() or - // currently only AddExpr and SubExpr are supported: else, fall-back to 0 not expr instanceof AddExpr and not expr instanceof SubExpr and base = expr and - result = 0 + offset = 0 } ``` - - -### TODO incoporate + -```java -int getFixedArrayOffset(ArrayExpr access) { - exists(Expr base, int offset | - offset = getExprOffsetValue(access.getArrayOffset(), base) and - result = getMaxStatedValue(base) + offset - ) -} -``` +### First 5 results +| test.c:16:17:16:22 | call to malloc | test.c:16:24:16:27 | size | test.c:21:5:21:13 | access to array | test.c:21:9:21:12 | size | test.c:15:19:15:22 | size | 0 | test.c:15:19:15:22 | size | 0 | +| test.c:28:17:28:22 | call to malloc | test.c:28:24:28:27 | size | test.c:39:5:39:13 | access to array | test.c:39:9:39:12 | size | test.c:26:19:26:22 | size | 0 | test.c:26:19:26:22 | size | 0 | +| test.c:28:17:28:22 | call to malloc | test.c:28:24:28:27 | size | test.c:43:9:43:17 | access to array | test.c:43:13:43:16 | size | test.c:26:19:26:22 | size | 0 | test.c:26:19:26:22 | size | 0 | +| test.c:28:17:28:22 | call to malloc | test.c:28:24:28:27 | size | test.c:44:9:44:21 | access to array | test.c:44:13:44:16 | size | test.c:26:19:26:22 | size | 0 | test.c:26:19:26:22 | size | 1 | +| test.c:28:17:28:22 | call to malloc | test.c:28:24:28:27 | size | test.c:45:9:45:21 | access to array | test.c:45:13:45:16 | size | test.c:26:19:26:22 | size | 0 | test.c:26:19:26:22 | size | 2 | - -### TODO incoporate - -```java -predicate isOffsetOutOfBoundsGVN(ArrayExpr access, FunctionCall source) { - ensureSameFunction(access, source) and - not isOffsetOutOfBoundsConstant(access, source, _, _) and - exists(Expr accessOffsetBase, int accessOffsetBaseValue | - accessOffsetBaseValue = getExprOffsetValue(access.getArrayOffset(), accessOffsetBase) and - globalValueNumber(source.getArgument(0)) = globalValueNumber(accessOffsetBase) and - not accessOffsetBaseValue < 0 - ) -} -``` + +## Step 9 – Global Value Numbering - +Range analyis won't bound `sz * x * y`, and simple equality checks don't work at the structure level, so switch to global value numbering. -### TODO incoporate +This is the case in the last test case, -```java -/** - * @id cpp/array-access-out-of-bounds - * @description Access of an array with an index that is greater or equal to the element num. - * @kind problem - * @problem.severity error - */ + void test_gvn_var(unsigned long x, unsigned long y, unsigned long sz) + { + char *buf = malloc(sz * x * y); + buf[sz * x * y - 1]; // COMPLIANT + buf[sz * x * y]; // NON_COMPLIANT + buf[sz * x * y + 1]; // NON_COMPLIANT + } -import cpp -import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis -import semmle.code.cpp.dataflow.DataFlow -import semmle.code.cpp.valuenumbering.GlobalValueNumbering -import RuntimeValues +Reference: -from FunctionCall source, ArrayExpr access, string message -where - exists(int allocSize, int accessOffset | - isOffsetOutOfBoundsConstant(access, source, allocSize, accessOffset) and - message = - "Array access out of bounds: " + access.toString() + " with offset " + accessOffset.toString() - + " on $@ with size " + allocSize.toString() - ) - or - isOffsetOutOfBoundsGVN(access, source) and - message = "Array access with index that is greater or equal to the size of the $@." -select access, message, source, "allocation" -``` +Global value numbering only knows that runtime values are equal; they are not comparable (`<, >, <=` etc.), and the *actual* value is not known. +Global value numbering finds expressions with the same known value, independent of structure. - +So, we look for and use *relative* values between allocation and use. -### interim +The relevant CodeQL constructs are ```java -/** - * @ kind problem - */ - -import cpp -import semmle.code.cpp.dataflow.DataFlow -import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis import semmle.code.cpp.valuenumbering.GlobalValueNumbering - -// Step 9 -from - AllocationExpr buffer, ArrayExpr access, Expr accessIdx, Expr allocSizeExpr, int bufferSize, - int allocsize, Expr bufferSizeExpr, int arrayTypeSize, int allocBaseSize, int accessMax, - int allocatedUnits, int maxAccessedIndex -where - // malloc (100) - // ^^^^^^^^^^^^ AllocationExpr buffer - // - // buf[...] - // ^^^ ArrayExpr access - // buf[...] - // ^^^ int accessIdx - accessIdx = access.getArrayOffset() and - // - // malloc (100) - // ^^^ allocSizeExpr / bufferSize - // - // Not really: - // allocSizeExpr = buffer.(Call).getArgument(0) and - // - DataFlow::localExprFlow(allocSizeExpr, buffer.(Call).getArgument(0)) and - allocsize = allocSizeExpr.getValue().toInt() and - // - // unsigned long size = 100; - // ... - // char *buf = malloc(size); - DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and - bufferSizeExpr.getValue().toInt() = bufferSize and - // char *buf = ... buf[0]; - // ^^^ ---> ^^^ - // or - // malloc(100); buf[0] - // ^^^ --------> ^^^ - // - arrayTypeSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() and - 1 = allocBaseSize and - DataFlow::localExprFlow(buffer, access.getArrayBase()) and - upperBound(accessIdx) = accessMax and - allocBaseSize * allocsize = allocatedUnits and - arrayTypeSize * accessMax = maxAccessedIndex and - // only consider out-of-bounds - maxAccessedIndex >= allocatedUnits -select access, - "Array access at or beyond size; have " + allocatedUnits + " units, access at " + maxAccessedIndex, - globalValueNumber(accessIdx) as gvnAccess, globalValueNumber(allocSizeExpr) as gvnAlloc +... +globalValueNumber(e) = globalValueNumber(sizeExpr) and +e != sizeExpr +... ``` +We can use global value numbering to identify common values as first step, but for expressions like - + buf[sz * x * y - 1]; // COMPLIANT + +we have to "evaluate" the expressions – or at least bound them. + +XX: For the cases with variable `malloc` sizes, like `test_const_branch`, GVN identifies same-value constant accesses, but we need a special case for same-value expression accesses. -### interim -Messy, start over. + + +### Solution ```java import cpp import semmle.code.cpp.dataflow.DataFlow -import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis import semmle.code.cpp.valuenumbering.GlobalValueNumbering -// Step 9 from - AllocationExpr buffer, ArrayExpr access, Expr accessIdx, Expr allocSizeExpr, GVN gvnAccess, - GVN gvnAlloc + AllocationExpr buffer, ArrayExpr access, + // --- + // Expr bufferSizeExpr + // int accessOffset, Expr accessBase, Expr bufferBase, int bufferOffset, Variable bufInit, + // +++ + Expr allocSizeExpr, Expr accessIdx, GVN gvnAccessIdx, GVN gvnAllocSizeExpr, int accessOffset where // malloc (100) // ^^^^^^^^^^^^ AllocationExpr buffer - // // buf[...] // ^^^ ArrayExpr access // buf[...] // ^^^ accessIdx accessIdx = access.getArrayOffset() and - // - // malloc (100) - // ^^^ allocSizeExpr / bufferSize - // unsigned long size = 100; - // ... - // char *buf = malloc(size); + // Find allocation size expression flowing to the allocation. DataFlow::localExprFlow(allocSizeExpr, buffer.getSizeExpr()) and - // char *buf = ... buf[0]; - // ^^^ ---> ^^^ - // or - // malloc(100); buf[0] - // ^^^ --------> ^^^ - // + // Ensure buffer access refers to the matching allocation DataFlow::localExprFlow(buffer, access.getArrayBase()) and - // // Use GVN - globalValueNumber(accessIdx) = gvnAccess and - globalValueNumber(allocSizeExpr) = gvnAlloc and + globalValueNumber(accessIdx) = gvnAccessIdx and + globalValueNumber(allocSizeExpr) = gvnAllocSizeExpr and ( - gvnAccess = gvnAlloc + // buf[size] or buf[100] + gvnAccessIdx = gvnAllocSizeExpr and + accessOffset = 0 or - // buf[sz * x * y] above // buf[sz * x * y + 1]; exists(AddExpr add | accessIdx = add and - // add.getAnOperand() = accessIdx and - add.getAnOperand().getValue().toInt() > 0 and - globalValueNumber(add.getAnOperand()) = gvnAlloc + accessOffset >= 0 and + accessOffset = add.getRightOperand().(Literal).getValue().toInt() and + globalValueNumber(add.getLeftOperand()) = gvnAllocSizeExpr ) ) -select access, gvnAccess, gvnAlloc +select access, gvnAllocSizeExpr, allocSizeExpr, buffer.getSizeExpr() as allocArg, gvnAccessIdx, + accessIdx, accessOffset ``` - + + +### First 5 results + +Results note: + +- The allocation size of 200 is never used in an access, so the GVN match eliminates it from the result list. + + | test.c:21:5:21:13 | access to array | test.c:15:26:15:28 | GVN | test.c:15:26:15:28 | 100 | test.c:16:24:16:27 | size | test.c:15:26:15:28 | GVN | test.c:21:9:21:12 | size | 0 | + | test.c:21:5:21:13 | access to array | test.c:15:26:15:28 | GVN | test.c:16:24:16:27 | size | test.c:16:24:16:27 | size | test.c:15:26:15:28 | GVN | test.c:21:9:21:12 | size | 0 | + | test.c:38:5:38:12 | access to array | test.c:26:39:26:41 | GVN | test.c:26:39:26:41 | 100 | test.c:28:24:28:27 | size | test.c:26:39:26:41 | GVN | test.c:38:9:38:11 | 100 | 0 | + | test.c:69:5:69:19 | access to array | test.c:63:24:63:33 | GVN | test.c:63:24:63:33 | allocsize | test.c:63:24:63:33 | allocsize | test.c:63:24:63:33 | GVN | test.c:69:9:69:18 | allocsize | 0 | + | test.c:73:9:73:23 | access to array | test.c:63:24:63:33 | GVN | test.c:63:24:63:33 | allocsize | test.c:63:24:63:33 | allocsize | test.c:63:24:63:33 | GVN | test.c:73:13:73:22 | allocsize | 0 | + + + ## TODO hashconsing diff --git a/session/session.org b/session/session.org index 30f89a6..19faf0c 100644 --- a/session/session.org +++ b/session/session.org @@ -372,7 +372,7 @@ To address these, take the query from the previous exercise and 2. Calculate the =allocSize= / =allocatedUnits= (from the malloc) 3. Compare them -*** Solution: +*** Solution #+INCLUDE: "example7.ql" src java *** First 5 results @@ -382,7 +382,7 @@ To address these, take the query from the previous exercise and 1. Account for base sizes -- =char= in this case. 2. Put all expressions into the select for review. -*** Solution: +*** Solution #+INCLUDE: "example7a.ql" src java *** First 5 results @@ -393,7 +393,7 @@ To address these, take the query from the previous exercise and 2. Compare buffer allocation size to the access index. 3. Report only the questionable entries. -*** Solution: +*** Solution #+INCLUDE: "example7b.ql" src java *** First 5 results @@ -446,7 +446,7 @@ To address these, take the query from the previous exercise and - This only handles very specific cases. Constructing counterexamples is easy. - We will address this in the next section. -*** Solution: +*** Solution #+INCLUDE: "example8.ql" src java *** First 5 results @@ -478,15 +478,15 @@ To address these, take the query from the previous exercise and - The product expression =sz * x * y= is not easily checked for equality. These are addressed in the next step. -*** Solution: +*** Solution #+INCLUDE: "example8a.ql" src java *** First 5 results #+INCLUDE: "../session-tests/example8a/example8a.expected" :lines "-6"’ ** Step 9 -- Global Value Numbering - Range analyis won't bound =sz * x * y=, and simpl equality checks don't work at - the structure level, so switch to global value numbering. + Range analyis won't bound =sz * x * y=, and simple equality checks don't work + at the structure level, so switch to global value numbering. This is the case in the last test case, #+begin_example @@ -527,122 +527,21 @@ To address these, take the query from the previous exercise and #+end_example we have to "evaluate" the expressions -- or at least bound them. -*** TODO incoporate - #+BEGIN_SRC java - int getFixedArrayOffset(ArrayExpr access) { - exists(Expr base, int offset | - offset = getExprOffsetValue(access.getArrayOffset(), base) and - result = getMaxStatedValue(base) + offset - ) - } - #+END_SRC - -*** TODO incoporate - #+BEGIN_SRC java - predicate isOffsetOutOfBoundsGVN(ArrayExpr access, FunctionCall source) { - ensureSameFunction(access, source) and - not isOffsetOutOfBoundsConstant(access, source, _, _) and - exists(Expr accessOffsetBase, int accessOffsetBaseValue | - accessOffsetBaseValue = getExprOffsetValue(access.getArrayOffset(), accessOffsetBase) and - globalValueNumber(source.getArgument(0)) = globalValueNumber(accessOffsetBase) and - not accessOffsetBaseValue < 0 - ) - } - #+END_SRC - -*** TODO incoporate - #+BEGIN_SRC java - /** - ,* @id cpp/array-access-out-of-bounds - ,* @description Access of an array with an index that is greater or equal to the element num. - ,* @kind problem - ,* @problem.severity error - ,*/ - - import cpp - import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis - import semmle.code.cpp.dataflow.DataFlow - import semmle.code.cpp.valuenumbering.GlobalValueNumbering - import RuntimeValues - - from FunctionCall source, ArrayExpr access, string message - where - exists(int allocSize, int accessOffset | - isOffsetOutOfBoundsConstant(access, source, allocSize, accessOffset) and - message = - "Array access out of bounds: " + access.toString() + " with offset " + accessOffset.toString() - + " on $@ with size " + allocSize.toString() - ) - or - isOffsetOutOfBoundsGVN(access, source) and - message = "Array access with index that is greater or equal to the size of the $@." - select access, message, source, "allocation" - #+END_SRC - -*** interim - #+BEGIN_SRC java - /** - ,* @ kind problem - ,*/ - - import cpp - import semmle.code.cpp.dataflow.DataFlow - import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis - import semmle.code.cpp.valuenumbering.GlobalValueNumbering - - // Step 9 - from - AllocationExpr buffer, ArrayExpr access, Expr accessIdx, Expr allocSizeExpr, int bufferSize, - int allocsize, Expr bufferSizeExpr, int arrayTypeSize, int allocBaseSize, int accessMax, - int allocatedUnits, int maxAccessedIndex - where - // malloc (100) - // ^^^^^^^^^^^^ AllocationExpr buffer - // - // buf[...] - // ^^^ ArrayExpr access - // buf[...] - // ^^^ int accessIdx - accessIdx = access.getArrayOffset() and - // - // malloc (100) - // ^^^ allocSizeExpr / bufferSize - // - // Not really: - // allocSizeExpr = buffer.(Call).getArgument(0) and - // - DataFlow::localExprFlow(allocSizeExpr, buffer.(Call).getArgument(0)) and - allocsize = allocSizeExpr.getValue().toInt() and - // - // unsigned long size = 100; - // ... - // char *buf = malloc(size); - DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and - bufferSizeExpr.getValue().toInt() = bufferSize and - // char *buf = ... buf[0]; - // ^^^ ---> ^^^ - // or - // malloc(100); buf[0] - // ^^^ --------> ^^^ - // - arrayTypeSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() and - 1 = allocBaseSize and - DataFlow::localExprFlow(buffer, access.getArrayBase()) and - upperBound(accessIdx) = accessMax and - allocBaseSize * allocsize = allocatedUnits and - arrayTypeSize * accessMax = maxAccessedIndex and - // only consider out-of-bounds - maxAccessedIndex >= allocatedUnits - select access, - "Array access at or beyond size; have " + allocatedUnits + " units, access at " + maxAccessedIndex, - globalValueNumber(accessIdx) as gvnAccess, globalValueNumber(allocSizeExpr) as gvnAlloc - #+END_SRC - -*** interim - Messy, start over. + XX: + For the cases with variable =malloc= sizes, like =test_const_branch=, GVN + identifies same-value constant accesses, but we need a special case for + same-value expression accesses. +*** Solution #+INCLUDE: "example9.ql" src java - + +*** First 5 results + Results note: + - The allocation size of 200 is never used in an access, so the GVN match + eliminates it from the result list. + + #+INCLUDE: "../session-tests/Example9/example9.expected" :lines "-6"’ + ** TODO hashconsing import semmle.code.cpp.valuenumbering.HashCons From 903b752824384cc4bdfd6964c43c5755e3a43b30 Mon Sep 17 00:00:00 2001 From: Michael Hohn Date: Tue, 23 May 2023 12:32:38 -0700 Subject: [PATCH 25/28] export to plain markdown to fix github rendering --- session/session.md | 2822 +++++++++++++++++++++++++++++--------------- 1 file changed, 1859 insertions(+), 963 deletions(-) diff --git a/session/session.md b/session/session.md index c89bd0e..c8c2a0f 100644 --- a/session/session.md +++ b/session/session.md @@ -1,51 +1,54 @@ -- [CodeQL Workshop — Using Data-Flow and Range Analysis to Find Out-Of-Bounds Accesses](#codeql-workshop--using-data-flow-and-range-analysis-to-find-out-of-bounds-accesses) -- [Acknowledgments](#acknowledgments) -- [Setup Instructions](#setup-instructions) -- [Introduction](#introduction) -- [A Note on the Scope of This Workshop](#a-note-on-the-scope-of-this-workshop) -- [Session/Workshop notes](#sessionworkshop-notes) - - [Step 1](#exercise-1) - - [Hints](#hints) - - [Solution](#orgf0fce94) - - [Step 2](#orgebf05ab) - - [Hints](#hints) - - [Solution](#org0096fb0) - - [Results](#org8fddea8) - - [Step 3](#exercise-2) - - [Solution](#org3698b58) - - [Results](#org00c9191) - - [Step 4](#org5cda1ae) - - [Hint](#orge721980) - - [Solution](#orgc515483) - - [Results](#org57d3bda) - - [Step 4a – some clean-up using predicates](#org0982d85) - - [Solution](#orgb26e7e5) - - [Step 5 – SimpleRangeAnalysis](#orgb82a277) - - [Solution](#orgad1e7b0) - - [First 5 results](#org2740a32) - - [Step 6](#orgd745b01) - - [Solution](#org8d5d426) - - [First 5 results](#orgebf1dcb) - - [Step 7](#org815886d) - - [Solution](#orgfc8c990) - - [First 5 results](#org3b1615b) - - [Step 7a](#orgc92b420) - - [Solution](#org34ac413) - - [First 5 results](#org6ad3b1e) - - [Step 7b](#org9cc9de8) - - [Solution](#orge60a202) - - [First 5 results](#orgcb20f8d) - - [Step 8](#org2bd04bc) - - [Solution](#org06f95fe) - - [First 5 results](#org46e475c) - - [Interim notes](#orgef00db2) - - [Step 8a](#org3d75997) - - [Solution](#orgf75a887) - - [First 5 results](#orgb242afe) - - [Step 9 – Global Value Numbering](#org0abea3c) - - [Solution](#orga707c2b) - - [First 5 results](#orgd058b22) - - [hashconsing](#org2e8a639) + +# Table of Contents + +1. [CodeQL Workshop — Using Data-Flow and Range Analysis to Find Out-Of-Bounds Accesses](#codeql-workshop--using-data-flow-and-range-analysis-to-find-out-of-bounds-accesses) +2. [Acknowledgments](#acknowledgments) +3. [Setup Instructions](#setup-instructions) +4. [Introduction](#introduction) +5. [A Note on the Scope of This Workshop](#a-note-on-the-scope-of-this-workshop) +6. [Session/Workshop notes](#sessionworkshop-notes) + 1. [Step 1](#exercise-1) + 1. [Hints](#hints) + 2. [Solution](#org14d20ad) + 2. [Step 2](#org6996134) + 1. [Hints](#hints) + 2. [Solution](#orge54f273) + 3. [Results](#org7721736) + 3. [Step 3](#exercise-2) + 1. [Solution](#org77a77b4) + 2. [Results](#org14b2eb8) + 4. [Step 4](#org70ec45b) + 1. [Hint](#org952151f) + 2. [Solution](#org443dc33) + 3. [Results](#org9eba298) + 5. [Step 4a – some clean-up using predicates](#orga1b1648) + 1. [Solution](#orgf6ab8fd) + 6. [Step 5 – SimpleRangeAnalysis](#orga0ae19d) + 1. [Solution](#org38203d6) + 2. [First 5 results](#org8d0b049) + 7. [Step 6](#org2e181e8) + 1. [Solution](#org7ff86a4) + 2. [First 5 results](#org35eb492) + 8. [Step 7](#orgbaba437) + 1. [Solution](#org2558217) + 2. [First 5 results](#org319c753) + 9. [Step 7a](#org5c3cbb9) + 1. [Solution](#org631c47f) + 2. [First 5 results](#orgdcbb8ea) + 10. [Step 7b](#org9b279f6) + 1. [Solution](#org54470ce) + 2. [First 5 results](#orga2d47ca) + 11. [Step 8](#orgbe1a4ba) + 1. [Solution](#org966c6c5) + 2. [First 5 results](#org9c29a8e) + 12. [Interim notes](#org39ee1c0) + 13. [Step 8a](#org477a7f7) + 1. [Solution](#orgf806ffb) + 2. [First 5 results](#org18e8bda) + 14. [Step 9 – Global Value Numbering](#org5b5e629) + 1. [Solution](#orgc717ad3) + 2. [First 5 results](#orgf97dbbc) + 15. [hashconsing](#orgc221436) @@ -57,7 +60,12 @@ # Acknowledgments -This session-based workshop is based on the exercise/unit-test-based material at , which in turn is based on a significantly simplified and modified version of the [OutOfBounds.qll library](https://github.com/github/codeql-coding-standards/blob/main/c/common/src/codingstandards/c/OutOfBounds.qll) from the [CodeQL Coding Standards repository](https://github.com/github/codeql-coding-standards). +This session-based workshop is based on the exercise/unit-test-based material at +, which in turn is +based on a significantly simplified and modified version of the +[OutOfBounds.qll library](https://github.com/github/codeql-coding-standards/blob/main/c/common/src/codingstandards/c/OutOfBounds.qll) from the +[CodeQL Coding Standards +repository](https://github.com/github/codeql-coding-standards). @@ -66,106 +74,145 @@ This session-based workshop is based on the exercise/unit-test-based material at - Install [Visual Studio Code](https://code.visualstudio.com/). -- Install the [CodeQL extension for Visual Studio Code](https://codeql.github.com/docs/codeql-for-visual-studio-code/setting-up-codeql-in-visual-studio-code/). +- Install the + [CodeQL extension for Visual Studio Code](https://codeql.github.com/docs/codeql-for-visual-studio-code/setting-up-codeql-in-visual-studio-code/). -- Install the latest version of the [CodeQL CLI](https://github.com/github/codeql-cli-binaries/releases). +- Install the latest version of the + [CodeQL CLI](https://github.com/github/codeql-cli-binaries/releases). - Clone this repository: - ```sh - git clone https://github.com/hohn/codeql-workshop-runtime-values-c - ``` + git clone https://github.com/hohn/codeql-workshop-runtime-values-c -- Install the CodeQL pack dependencies using the command `CodeQL: Install Pack Dependencies` and select `exercises`, `solutions`, `exercises-tests`, `session`, `session-db` and `solutions-tests` from the list of packs. +- Install the CodeQL pack dependencies using the command + `CodeQL: Install Pack Dependencies` and select `exercises`, + `solutions`, `exercises-tests`, `session`, `session-db` and + `solutions-tests` from the list of packs. -- If you have CodeQL on your PATH, build the database using `build-database.sh` and load the database with the VS Code CodeQL extension. It is at `session-db/cpp-runtime-values-db`. - - Alternatively, you can download [this pre-built database](https://drive.google.com/file/d/1N8TYJ6f4E33e6wuyorWHZHVCHBZy8Bhb/view?usp=sharing). +- If you have CodeQL on your PATH, build the database using + `build-database.sh` and load the database with the VS Code CodeQL + extension. It is at `session-db/cpp-runtime-values-db`. + - Alternatively, you can download + [this + pre-built database](https://drive.google.com/file/d/1N8TYJ6f4E33e6wuyorWHZHVCHBZy8Bhb/view?usp=sharing). -- If you do **not** have CodeQL on your PATH, build the database using the unit test sytem. Choose the `TESTING` tab in VS Code, run the `session-db/DB/db.qlref` test. The test will fail, but it leaves a usable CodeQL database in `session-db/DB/DB.testproj`. +- If you do **not** have CodeQL on your PATH, build the database using the + unit test sytem. Choose the `TESTING` tab in VS Code, run the + `session-db/DB/db.qlref` test. The test will fail, but it leaves a + usable CodeQL database in `session-db/DB/DB.testproj`. -- ❗Important❗: Run `initialize-qltests.sh` to initialize the tests. Otherwise, you will not be able to run the QLTests in `exercises-tests`. +- ❗Important❗: Run `initialize-qltests.sh` to initialize the tests. + Otherwise, you will not be able to run the QLTests in + `exercises-tests`. # Introduction -This workshop focuses on analyzing and relating two values — array access indices and memory allocation sizes — in order to identify simple cases of out-of-bounds array accesses. +This workshop focuses on analyzing and relating two values — array +access indices and memory allocation sizes — in order to identify +simple cases of out-of-bounds array accesses. -The following snippets demonstrate how an out-of-bounds array access can occur: +The following snippets demonstrate how an out-of-bounds array access can +occur: -```cpp -char* buffer = malloc(10); -buffer[9] = 'a'; // ok -buffer[10] = 'b'; // out-of-bounds -``` + char* buffer = malloc(10); + buffer[9] = 'a'; // ok + buffer[10] = 'b'; // out-of-bounds A more complex example: -```cpp -char* buffer; -if(rand() == 1) { - buffer = malloc(10); -} -else { - buffer = malloc(11); -} -size_t index = 0; -if(rand() == 1) { - index = 10; -} -buffer[index]; // potentially out-of-bounds depending on control-flow -``` - -Another common case *not* covered in this introductory workshop involves loops, as follows: - -```cpp -int elements[5]; -for (int i = 0; i <= 5; ++i) { - elements[i] = 0; -} -``` - -To find these issues, we can implement an analysis that tracks the upper or lower bounds on an expression and, combined with data-flow analysis to reduce false-positives, identifies cases where the index of the array results in an access beyond the allocated size of the buffer. + char* buffer; + if(rand() == 1) { + buffer = malloc(10); + } + else { + buffer = malloc(11); + } + size_t index = 0; + if(rand() == 1) { + index = 10; + } + buffer[index]; // potentially out-of-bounds depending on control-flow + +Another common case *not* covered in this introductory workshop involves +loops, as follows: + + int elements[5]; + for (int i = 0; i <= 5; ++i) { + elements[i] = 0; + } + +To find these issues, we can implement an analysis that tracks the upper +or lower bounds on an expression and, combined with data-flow analysis +to reduce false-positives, identifies cases where the index of the array +results in an access beyond the allocated size of the buffer. # A Note on the Scope of This Workshop -This workshop is not intended to be a complete analysis that is useful for real-world cases of out-of-bounds analyses for reasons including but not limited to: +This workshop is not intended to be a complete analysis that is useful +for real-world cases of out-of-bounds analyses for reasons including but +not limited to: - Missing support for loops and recursion - No interprocedural analysis - Missing size calculation of arrays where the element size is not 1 -- No support for pointer arithmetic or in general, operations other than addition and subtraction +- No support for pointer arithmetic or in general, operations other than + addition and subtraction - Overly specific modelling of a buffer access as an array expression -The goal of this workshop is rather to demonstrate the building blocks of analyzing run-time values and how to apply those building blocks to modelling a common class of vulnerability. A more comprehensive and production-appropriate example is the [OutOfBounds.qll library](https://github.com/github/codeql-coding-standards/blob/main/c/common/src/codingstandards/c/OutOfBounds.qll) from the [CodeQL Coding Standards repository](https://github.com/github/codeql-coding-standards). +The goal of this workshop is rather to demonstrate the building blocks +of analyzing run-time values and how to apply those building blocks to +modelling a common class of vulnerability. A more comprehensive and +production-appropriate example is the +[OutOfBounds.qll +library](https://github.com/github/codeql-coding-standards/blob/main/c/common/src/codingstandards/c/OutOfBounds.qll) from the +[CodeQL Coding +Standards repository](https://github.com/github/codeql-coding-standards). # Session/Workshop notes -Unlike the the [exercises](../README.md#org3b74422) which use the *collection* of test problems in `exercises-test`, this workshop is a sequential session following the actual process of writing CodeQL: use a *single* database built from a single, larger segment of code and inspect the query results as you write the query. +Unlike the the [exercises](../README.md#org3b74422) which use the *collection* of test problems in +`exercises-test`, this workshop is a sequential session following the actual +process of writing CodeQL: use a *single* database built from a single, larger +segment of code and inspect the query results as you write the query. -For this workshop, the larger segment of code is still simplified skeleton code, not a full source code repository. +For this workshop, the larger segment of code is still simplified skeleton code, +not a full source code repository. -The queries are embedded in \`session.md\` but can also be found in the \`example\*.ql\` files. They can all be run as test cases in VS Code. +The queries are embedded in \`session.md\` but can also be found in the +\`example\*.ql\` files. They can all be run as test cases in VS Code. To reiterate: -This workshop focuses on analyzing and relating two *static* values — array access indices and memory allocation sizes — in order to identify simple cases of out-of-bounds array accesses. We do not handle *dynamic* values but take advantage of special cases. +This workshop focuses on analyzing and relating two *static* values — array +access indices and memory allocation sizes — in order to identify +simple cases of out-of-bounds array accesses. We do not handle *dynamic* values +but take advantage of special cases. To find these issues, -1. We can implement an analysis that tracks the upper or lower bounds on an expression. -2. We then combine this with data-flow analysis to reduce false positives and identify cases where the index of the array results in an access beyond the allocated size of the buffer. -3. We further extend these queries with rudimentary arithmetic support involving expressions common to the allocation and the array access. -4. For cases where constant expressions are not available or are uncertain, we first try [range analysis](#orgb82a277) to expand the query's applicability. -5. For cases where this is insufficient, we introduce global value numbering [GVN](https://codeql.github.com/docs/codeql-language-guides/hash-consing-and-value-numbering) in [Step 9 – Global Value Numbering](#org0abea3c), to detect values known to be equal at runtime. -6. When *those* cases are insufficient, we handle the case of identical structure using [hashconsing](#org2e8a639). +1. We can implement an analysis that tracks the upper or lower bounds on an + expression. +2. We then combine this with data-flow analysis to reduce false positives and + identify cases where the index of the array results in an access beyond the + allocated size of the buffer. +3. We further extend these queries with rudimentary arithmetic support involving + expressions common to the allocation and the array access. +4. For cases where constant expressions are not available or are uncertain, we + first try [range analysis](#orga0ae19d) to expand the query's applicability. +5. For cases where this is insufficient, we introduce global value numbering + [GVN](https://codeql.github.com/docs/codeql-language-guides/hash-consing-and-value-numbering) in [Step 9 – Global Value Numbering](#org5b5e629), to detect values known to be equal + at runtime. +6. When *those* cases are insufficient, we handle the case of identical + structure using [hashconsing](#orgc221436). @@ -175,9 +222,11 @@ To find these issues, In the first step we are going to 1. identify a dynamic allocation with `malloc` and -2. an access to that allocated buffer. The access is via an array expression; we are **not** going to cover pointer dereferencing. +2. an access to that allocated buffer. The access is via an array expression; + we are **not** going to cover pointer dereferencing. -The goal of this exercise is to then output the array access, array size, buffer, and buffer offset. +The goal of this exercise is to then output the array access, array size, +buffer, and buffer offset. The focus here is on @@ -194,46 +243,47 @@ in [db.c](file:///Users/hohn/local/codeql-workshop-runtime-values-c/session-db/D ### Hints -1. `Expr::getValue()::toInt()` can be used to get the integer value of a constant expression. +1. `Expr::getValue()::toInt()` can be used to get the integer value of a + constant expression. - + ### Solution -```java -import cpp -import semmle.code.cpp.dataflow.DataFlow - -from AllocationExpr buffer, ArrayExpr access, int bufferSize, int accessIdx, Expr allocSizeExpr -where - // malloc (100) - // ^^^^^^ AllocationExpr buffer - // - // buf[...] - // ^^^ ArrayExpr access - // - // buf[...] - // ^^^ int accessIdx - // - accessIdx = access.getArrayOffset().getValue().toInt() and - // - // malloc (100) - // ^^^ allocSizeExpr / bufferSize - // - allocSizeExpr = buffer.(Call).getArgument(0) and - bufferSize = allocSizeExpr.getValue().toInt() -select buffer, access, accessIdx, access.getArrayOffset(), bufferSize, allocSizeExpr -``` + import cpp + import semmle.code.cpp.dataflow.DataFlow + + from AllocationExpr buffer, ArrayExpr access, int bufferSize, int accessIdx, Expr allocSizeExpr + where + // malloc (100) + // ^^^^^^ AllocationExpr buffer + // + // buf[...] + // ^^^ ArrayExpr access + // + // buf[...] + // ^^^ int accessIdx + // + accessIdx = access.getArrayOffset().getValue().toInt() and + // + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // + allocSizeExpr = buffer.(Call).getArgument(0) and + bufferSize = allocSizeExpr.getValue().toInt() + select buffer, access, accessIdx, access.getArrayOffset(), bufferSize, allocSizeExpr This produces 12 results, with some cross-function pairs. - + ## Step 2 -The previous query fails to connect the `malloc` calls with the array accesses, and in the results, `mallocs` from one function are paired with accesses in another. +The previous query fails to connect the `malloc` calls with the array accesses, +and in the results, `mallocs` from one function are paired with accesses in +another. To address these, take the query from the previous exercise and @@ -245,57 +295,57 @@ To address these, take the query from the previous exercise and ### Hints -1. Use `DataFlow::localExprFlow()` to relate the allocated buffer to the array base. -2. The the array base is the `buf` part of `buf[0]`. Use the `Expr.getArrayBase()` predicate. +1. Use `DataFlow::localExprFlow()` to relate the allocated buffer to the + array base. +2. The the array base is the `buf` part of `buf[0]`. Use the + `Expr.getArrayBase()` predicate. - + ### Solution -```java -import cpp -import semmle.code.cpp.dataflow.DataFlow - -// Step 2 -// void test_const(void) -// void test_const_var(void) -from AllocationExpr buffer, ArrayExpr access, int bufferSize, int accessIdx, Expr allocSizeExpr -where - // malloc (100) - // ^^^^^^ AllocationExpr buffer - // - // buf[...] - // ^^^ ArrayExpr access - // - // buf[...] - // ^^^ int accessIdx - // - accessIdx = access.getArrayOffset().getValue().toInt() and - // - // malloc (100) - // ^^^ allocSizeExpr / bufferSize - // - allocSizeExpr = buffer.(Call).getArgument(0) and - bufferSize = allocSizeExpr.getValue().toInt() and - // - // Ensure buffer access is to the correct allocation. - // char *buf = ... buf[0]; - // ^^^ ---> ^^^ - // or - // malloc(100); buf[0] - // ^^^ --------> ^^^ - // - DataFlow::localExprFlow(buffer, access.getArrayBase()) -select buffer, access, accessIdx, access.getArrayOffset(), bufferSize, allocSizeExpr -``` - - - + import cpp + import semmle.code.cpp.dataflow.DataFlow + + // Step 2 + // void test_const(void) + // void test_const_var(void) + from AllocationExpr buffer, ArrayExpr access, int bufferSize, int accessIdx, Expr allocSizeExpr + where + // malloc (100) + // ^^^^^^ AllocationExpr buffer + // + // buf[...] + // ^^^ ArrayExpr access + // + // buf[...] + // ^^^ int accessIdx + // + accessIdx = access.getArrayOffset().getValue().toInt() and + // + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // + allocSizeExpr = buffer.(Call).getArgument(0) and + bufferSize = allocSizeExpr.getValue().toInt() and + // + // Ensure buffer access is to the correct allocation. + // char *buf = ... buf[0]; + // ^^^ ---> ^^^ + // or + // malloc(100); buf[0] + // ^^^ --------> ^^^ + // + DataFlow::localExprFlow(buffer, access.getArrayBase()) + select buffer, access, accessIdx, access.getArrayOffset(), bufferSize, allocSizeExpr + + + ### Results -There are now 3 results. These are from only one function, the one using constants. +There are now 3 results. These are from only one function, the one using constants. @@ -304,143 +354,140 @@ There are now 3 results. These are from only one function, the one using constan The previous results need to be extended to the case -```c++ -void test_const_var(void) -{ - unsigned long size = 100; - char *buf = malloc(size); - buf[0]; // COMPLIANT - ... -} -``` + void test_const_var(void) + { + unsigned long size = 100; + char *buf = malloc(size); + buf[0]; // COMPLIANT + ... + } -Here, the `malloc` argument is a variable with known value. +Here, the `malloc` argument is a variable with known value. We include this result by removing the size-retrieval from the prior query. - + ### Solution -```java -import cpp -import semmle.code.cpp.dataflow.DataFlow - -// Step 3 -// void test_const_var(void) -from AllocationExpr buffer, ArrayExpr access, int accessIdx, Expr allocSizeExpr -where - // malloc (100) - // ^^^^^^ AllocationExpr buffer - // - // buf[...] - // ^^^ ArrayExpr access - // - // buf[...] - // ^^^ int accessIdx - // - accessIdx = access.getArrayOffset().getValue().toInt() and - // - // malloc (100) - // ^^^ allocSizeExpr / bufferSize - // - allocSizeExpr = buffer.(Call).getArgument(0) and - // bufferSize = allocSizeExpr.getValue().toInt() and - // - // Ensure buffer access is to the correct allocation. - // char *buf = ... buf[0]; - // ^^^ ---> ^^^ - // or - // malloc(100); buf[0] - // ^^^ --------> ^^^ - // - DataFlow::localExprFlow(buffer, access.getArrayBase()) -select buffer, access, accessIdx, access.getArrayOffset() -``` - - - + import cpp + import semmle.code.cpp.dataflow.DataFlow + + // Step 3 + // void test_const_var(void) + from AllocationExpr buffer, ArrayExpr access, int accessIdx, Expr allocSizeExpr + where + // malloc (100) + // ^^^^^^ AllocationExpr buffer + // + // buf[...] + // ^^^ ArrayExpr access + // + // buf[...] + // ^^^ int accessIdx + // + accessIdx = access.getArrayOffset().getValue().toInt() and + // + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // + allocSizeExpr = buffer.(Call).getArgument(0) and + // bufferSize = allocSizeExpr.getValue().toInt() and + // + // Ensure buffer access is to the correct allocation. + // char *buf = ... buf[0]; + // ^^^ ---> ^^^ + // or + // malloc(100); buf[0] + // ^^^ --------> ^^^ + // + DataFlow::localExprFlow(buffer, access.getArrayBase()) + select buffer, access, accessIdx, access.getArrayOffset() + + + ### Results Now, we get 12 results, including some from other test cases. - + ## Step 4 -We are looking for out-of-bounds accesses, so we to need to include the bounds. But in a more general way than looking only at constant values. +We are looking for out-of-bounds accesses, so we to need to include the +bounds. But in a more general way than looking only at constant values. -Note the results for the cases in `test_const_var` which involve a variable access rather than a constant. The next goal is +Note the results for the cases in `test_const_var` which involve a variable +access rather than a constant. The next goal is -1. to handle the case where the allocation size or array index are variables (with constant values) rather than integer constants. +1. to handle the case where the allocation size or array index are variables + (with constant values) rather than integer constants. We have an expression `size` that flows into the `malloc()` call. - + ### Hint - + ### Solution -```java -import cpp -import semmle.code.cpp.dataflow.DataFlow - -// Step 4 -from AllocationExpr buffer, ArrayExpr access, int accessIdx, Expr allocSizeExpr, int bufferSize, Expr bse -where - // malloc (100) - // ^^^^^^^^^^^^ AllocationExpr buffer - // - // buf[...] - // ^^^ ArrayExpr access - // - // buf[...] - // ^^^ int accessIdx - // - accessIdx = access.getArrayOffset().getValue().toInt() and - // - // malloc (100) - // ^^^ allocSizeExpr / bufferSize - // - allocSizeExpr = buffer.(Call).getArgument(0) and - // bufferSize = allocSizeExpr.getValue().toInt() and - // - // unsigned long size = 100; - // ... - // char *buf = malloc(size); - exists(Expr bufferSizeExpr | - DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and - bufferSizeExpr.getValue().toInt() = bufferSize - and bse = bufferSizeExpr - ) and - // Ensure buffer access is to the correct allocation. - // char *buf = ... buf[0]; - // ^^^ ---> ^^^ - // or - // malloc(100); buf[0] - // ^^^ --------> ^^^ - // - DataFlow::localExprFlow(buffer, access.getArrayBase()) -select buffer, access, accessIdx, access.getArrayOffset(), bufferSize, bse -``` - - - + import cpp + import semmle.code.cpp.dataflow.DataFlow + + // Step 4 + from AllocationExpr buffer, ArrayExpr access, int accessIdx, Expr allocSizeExpr, int bufferSize, Expr bse + where + // malloc (100) + // ^^^^^^^^^^^^ AllocationExpr buffer + // + // buf[...] + // ^^^ ArrayExpr access + // + // buf[...] + // ^^^ int accessIdx + // + accessIdx = access.getArrayOffset().getValue().toInt() and + // + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // + allocSizeExpr = buffer.(Call).getArgument(0) and + // bufferSize = allocSizeExpr.getValue().toInt() and + // + // unsigned long size = 100; + // ... + // char *buf = malloc(size); + exists(Expr bufferSizeExpr | + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + bufferSizeExpr.getValue().toInt() = bufferSize + and bse = bufferSizeExpr + ) and + // Ensure buffer access is to the correct allocation. + // char *buf = ... buf[0]; + // ^^^ ---> ^^^ + // or + // malloc(100); buf[0] + // ^^^ --------> ^^^ + // + DataFlow::localExprFlow(buffer, access.getArrayBase()) + select buffer, access, accessIdx, access.getArrayOffset(), bufferSize, bse + + + ### Results Now, we get 15 results, limited to statically determined values. - + ## Step 4a – some clean-up using predicates @@ -448,99 +495,109 @@ Note that the dataflow automatically captures/includes the allocSizeExpr = buffer.(Call).getArgument(0) -so that's now redundant with `bufferSizeExpr` and can be removed. - -```java - -allocSizeExpr = buffer.(Call).getArgument(0) and -// bufferSize = allocSizeExpr.getValue().toInt() and -// -// unsigned long size = 100; -// ... -// char *buf = malloc(size); -DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and +so that's now redundant with `bufferSizeExpr` and can be removed. -``` + + allocSizeExpr = buffer.(Call).getArgument(0) and + // bufferSize = allocSizeExpr.getValue().toInt() and + // + // unsigned long size = 100; + // ... + // char *buf = malloc(size); + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and Also, simplify the `from...where...select`: 1. Remove unnecessary `exists` clauses. -2. Use `DataFlow::localExprFlow` for the buffer and allocation sizes, with `getValue().toInt()` as one possibility (one predicate). +2. Use `DataFlow::localExprFlow` for the buffer and allocation sizes, with + `getValue().toInt()` as one possibility (one predicate). - + ### Solution -```java -import cpp -import semmle.code.cpp.dataflow.DataFlow - -from AllocationExpr buffer, ArrayExpr access, int accessIdx, int bufferSize, Expr bufferSizeExpr -where - // malloc (100) - // ^^^^^^^^^^^^ AllocationExpr buffer - // - // buf[...] - // ^^^ ArrayExpr access - // - // buf[...] - // ^^^ int accessIdx - // - accessIdx = access.getArrayOffset().getValue().toInt() and - getAllocConstantExpr(bufferSizeExpr, bufferSize) and - // Ensure buffer access refers to the matching allocation - // ensureSameFunction(buffer, access.getArrayBase()) and - DataFlow::localExprFlow(buffer, access.getArrayBase()) and - // Ensure buffer access refers to the matching allocation - // ensureSameFunction(bufferSizeExpr, buffer.getSizeExpr()) and - DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) - // -select buffer, access, accessIdx, access.getArrayOffset(), bufferSize, bufferSizeExpr - -/** - * Gets an expression that flows to the allocation (which includes those already in the allocation) - * and has a constant value. - */ -predicate getAllocConstantExpr(Expr bufferSizeExpr, int bufferSize) { - exists(AllocationExpr buffer | - // - // Capture BOTH with datflow: - // 1. - // malloc (100) - // ^^^ allocSizeExpr / bufferSize - // - // 2. - // unsigned long size = 100; - // ... - // char *buf = malloc(size); - DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and - bufferSizeExpr.getValue().toInt() = bufferSize - ) -} -``` + import cpp + import semmle.code.cpp.dataflow.DataFlow + + from AllocationExpr buffer, ArrayExpr access, int accessIdx, int bufferSize, Expr bufferSizeExpr + where + // malloc (100) + // ^^^^^^^^^^^^ AllocationExpr buffer + // + // buf[...] + // ^^^ ArrayExpr access + // + // buf[...] + // ^^^ int accessIdx + // + accessIdx = access.getArrayOffset().getValue().toInt() and + getAllocConstantExpr(bufferSizeExpr, bufferSize) and + // Ensure buffer access refers to the matching allocation + // ensureSameFunction(buffer, access.getArrayBase()) and + DataFlow::localExprFlow(buffer, access.getArrayBase()) and + // Ensure buffer access refers to the matching allocation + // ensureSameFunction(bufferSizeExpr, buffer.getSizeExpr()) and + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) + // + select buffer, access, accessIdx, access.getArrayOffset(), bufferSize, bufferSizeExpr + + /** + * Gets an expression that flows to the allocation (which includes those already in the allocation) + * and has a constant value. + */ + predicate getAllocConstantExpr(Expr bufferSizeExpr, int bufferSize) { + exists(AllocationExpr buffer | + // + // Capture BOTH with datflow: + // 1. + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // + // 2. + // unsigned long size = 100; + // ... + // char *buf = malloc(size); + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + bufferSizeExpr.getValue().toInt() = bufferSize + ) + } - + ## Step 5 – SimpleRangeAnalysis -Running the query from Step 2 against the database yields a significant number of missing or incorrect results. The reason is that although great at identifying compile-time constants and their use, data-flow analysis is not always the right tool for identifying the *range* of values an `Expr` might have, particularly when multiple potential constants might flow to an `Expr`. +Running the query from Step 2 against the database yields a +significant number of missing or incorrect results. The reason is that +although great at identifying compile-time constants and their use, +data-flow analysis is not always the right tool for identifying the +*range* of values an `Expr` might have, particularly when multiple +potential constants might flow to an `Expr`. -The range analysis already handles conditional branches; we don't have to use guards on data flow – don't implement your own interpreter if you can use the library. +The range analysis already handles conditional branches; we don't +have to use guards on data flow – don't implement your own interpreter +if you can use the library. -The CodeQL standard library has several mechanisms for addressing this problem; in the remainder of this workshop we will explore two of them: `SimpleRangeAnalysis` and, later, `GlobalValueNumbering`. +The CodeQL standard library has several mechanisms for addressing this +problem; in the remainder of this workshop we will explore two of them: +`SimpleRangeAnalysis` and, later, `GlobalValueNumbering`. -Although not in the scope of this workshop, a standard use-case for range analysis is reliably identifying integer overflow and validating integer overflow checks. +Although not in the scope of this workshop, a standard use-case for +range analysis is reliably identifying integer overflow and validating +integer overflow checks. -Now, add the use of the `SimpleRangeAnalysis` library. Specifically, the relevant library predicates are `upperBound` and `lowerBound`, to be used with the buffer access argument. +Now, add the use of the `SimpleRangeAnalysis` library. Specifically, the +relevant library predicates are `upperBound` and `lowerBound`, to be used with +the buffer access argument. Notes: - This requires the import import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis -- We are not limiting the array access to integers any longer. Thus, we just use +- We are not limiting the array access to integers any longer. Thus, we just + use accessIdx = access.getArrayOffset() - To see the results in the order used in the C code, use @@ -548,80 +605,164 @@ Notes: select bufferSizeExpr, buffer, access, accessIdx, upperBound(accessIdx) as accessMax - + ### Solution -```java -import cpp -import semmle.code.cpp.dataflow.DataFlow -import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis - - -from AllocationExpr buffer, ArrayExpr access, Expr accessIdx, int bufferSize, Expr bufferSizeExpr -where - // malloc (100) - // ^^^^^^^^^^^^ AllocationExpr buffer - // - // buf[...] - // ^^^ ArrayExpr access - // - // buf[...] - // ^^^ int accessIdx - // - accessIdx = access.getArrayOffset() and - // - // malloc (100) - // ^^^ allocSizeExpr / bufferSize - // - getAllocConstantExpr(bufferSizeExpr, bufferSize) and - // Ensure buffer access is to the correct allocation. - DataFlow::localExprFlow(buffer, access.getArrayBase()) and - // Ensure use refers to the correct size defintion, even for non-constant - // expressions. - DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) - // -select bufferSizeExpr, buffer, access, accessIdx, upperBound(accessIdx) as accessMax - -/** - * Gets an expression that flows to the allocation (which includes those already in the allocation) - * and has a constant value. - */ -predicate getAllocConstantExpr(Expr bufferSizeExpr, int bufferSize) { - exists(AllocationExpr buffer | - // - // Capture BOTH with datflow: - // 1. - // malloc (100) - // ^^^ allocSizeExpr / bufferSize - // - // 2. - // unsigned long size = 100; - // ... - // char *buf = malloc(size); - DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and - bufferSizeExpr.getValue().toInt() = bufferSize - ) -} -``` + import cpp + import semmle.code.cpp.dataflow.DataFlow + import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis + + + from AllocationExpr buffer, ArrayExpr access, Expr accessIdx, int bufferSize, Expr bufferSizeExpr + where + // malloc (100) + // ^^^^^^^^^^^^ AllocationExpr buffer + // + // buf[...] + // ^^^ ArrayExpr access + // + // buf[...] + // ^^^ int accessIdx + // + accessIdx = access.getArrayOffset() and + // + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // + getAllocConstantExpr(bufferSizeExpr, bufferSize) and + // Ensure buffer access is to the correct allocation. + DataFlow::localExprFlow(buffer, access.getArrayBase()) and + // Ensure use refers to the correct size defintion, even for non-constant + // expressions. + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) + // + select bufferSizeExpr, buffer, access, accessIdx, upperBound(accessIdx) as accessMax + + /** + * Gets an expression that flows to the allocation (which includes those already in the allocation) + * and has a constant value. + */ + predicate getAllocConstantExpr(Expr bufferSizeExpr, int bufferSize) { + exists(AllocationExpr buffer | + // + // Capture BOTH with datflow: + // 1. + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // + // 2. + // unsigned long size = 100; + // ... + // char *buf = malloc(size); + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + bufferSizeExpr.getValue().toInt() = bufferSize + ) + } - + ### First 5 results -| test.c:7:24:7:26 | 100 | test.c:7:17:7:22 | call to malloc | test.c:8:5:8:10 | access to array | test.c:8:9:8:9 | 0 | 0.0 | -| test.c:7:24:7:26 | 100 | test.c:7:17:7:22 | call to malloc | test.c:9:5:9:11 | access to array | test.c:9:9:9:10 | 99 | 99.0 | -| test.c:7:24:7:26 | 100 | test.c:7:17:7:22 | call to malloc | test.c:10:5:10:12 | access to array | test.c:10:9:10:11 | 100 | 100.0 | -| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:17:5:17:10 | access to array | test.c:17:9:17:9 | 0 | 0.0 | -| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:18:5:18:11 | access to array | test.c:18:9:18:10 | 99 | 99.0 | - - - + + + +++ ++ ++ ++ ++ ++ ++ ++ ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
test.c:7:24:7:26100test.c:7:17:7:22call to malloctest.c:8:5:8:10access to arraytest.c:8:9:8:900.0
test.c:7:24:7:26100test.c:7:17:7:22call to malloctest.c:9:5:9:11access to arraytest.c:9:9:9:109999.0
test.c:7:24:7:26100test.c:7:17:7:22call to malloctest.c:10:5:10:12access to arraytest.c:10:9:10:11100100.0
test.c:15:26:15:28100test.c:16:17:16:22call to malloctest.c:17:5:17:10access to arraytest.c:17:9:17:900.0
test.c:15:26:15:28100test.c:16:17:16:22call to malloctest.c:18:5:18:11access to arraytest.c:18:9:18:109999.0
+ + + ## Step 6 -To finally determine (some) out-of-bounds accesses, we have to convert allocation units (usually in bytes) to size units. Then we are finally in a position to compare buffer allocation size to the access index to find out-of-bounds accesses – at least for expressions with known values. +To finally determine (some) out-of-bounds accesses, we have to convert +allocation units (usually in bytes) to size units. Then we are finally in a +position to compare buffer allocation size to the access index to find +out-of-bounds accesses – at least for expressions with known values. Add these to the query: @@ -630,165 +771,406 @@ Add these to the query: Hints: -1. We need the size of the array element. Use `access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType()` to see the type and `access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize()` to get its size. +1. We need the size of the array element. Use + `access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType()` + to see the type and + `access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize()` + to get its size. -2. Note from the docs: *The malloc() function allocates size bytes of memory and returns a pointer to the allocated memory.* So `size = 1` +2. Note from the docs: + *The malloc() function allocates size bytes of memory and returns a pointer + to the allocated memory.* + So `size = 1` -3. These test cases all use type `char`. What would happen for `int` or `double`? +3. These test cases all use type `char`. What would happen for `int` or + `double`? - + ### Solution -```java -import cpp -import semmle.code.cpp.dataflow.DataFlow -import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis - -from AllocationExpr buffer, ArrayExpr access, Expr accessIdx, int bufferSize, Expr bufferSizeExpr -where - // malloc (100) - // ^^^^^^^^^^^^ AllocationExpr buffer - // - // buf[...] - // ^^^ ArrayExpr access - // - // buf[...] - // ^^^ int accessIdx - // - accessIdx = access.getArrayOffset() and - // - // malloc (100) - // ^^^ allocSizeExpr / bufferSize - // - getAllocConstantExpr(bufferSizeExpr, bufferSize) and - // Ensure buffer access is to the correct allocation. - DataFlow::localExprFlow(buffer, access.getArrayBase()) and - // Ensure use refers to the correct size defintion, even for non-constant - // expressions. - DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) - // -select bufferSizeExpr, buffer, access, accessIdx, upperBound(accessIdx) as accessMax, bufferSize, - access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType() as arrayBaseType, - access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() as arrayTypeSize, - 1 as allocBaseSize - -/** - * Gets an expression that flows to the allocation (which includes those already in the allocation) - * and has a constant value. - */ -predicate getAllocConstantExpr(Expr bufferSizeExpr, int bufferSize) { - exists(AllocationExpr buffer | - // - // Capture BOTH with datflow: - // 1. - // malloc (100) - // ^^^ allocSizeExpr / bufferSize - // - // 2. - // unsigned long size = 100; - // ... - // char *buf = malloc(size); - DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and - bufferSizeExpr.getValue().toInt() = bufferSize - ) -} -``` + import cpp + import semmle.code.cpp.dataflow.DataFlow + import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis + + from AllocationExpr buffer, ArrayExpr access, Expr accessIdx, int bufferSize, Expr bufferSizeExpr + where + // malloc (100) + // ^^^^^^^^^^^^ AllocationExpr buffer + // + // buf[...] + // ^^^ ArrayExpr access + // + // buf[...] + // ^^^ int accessIdx + // + accessIdx = access.getArrayOffset() and + // + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // + getAllocConstantExpr(bufferSizeExpr, bufferSize) and + // Ensure buffer access is to the correct allocation. + DataFlow::localExprFlow(buffer, access.getArrayBase()) and + // Ensure use refers to the correct size defintion, even for non-constant + // expressions. + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) + // + select bufferSizeExpr, buffer, access, accessIdx, upperBound(accessIdx) as accessMax, bufferSize, + access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType() as arrayBaseType, + access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() as arrayTypeSize, + 1 as allocBaseSize + + /** + * Gets an expression that flows to the allocation (which includes those already in the allocation) + * and has a constant value. + */ + predicate getAllocConstantExpr(Expr bufferSizeExpr, int bufferSize) { + exists(AllocationExpr buffer | + // + // Capture BOTH with datflow: + // 1. + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // + // 2. + // unsigned long size = 100; + // ... + // char *buf = malloc(size); + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + bufferSizeExpr.getValue().toInt() = bufferSize + ) + } - + ### First 5 results -| test.c:7:24:7:26 | 100 | test.c:7:17:7:22 | call to malloc | test.c:8:5:8:10 | access to array | test.c:8:9:8:9 | 0 | 0.0 | 100 | | char | 1 | 1 | -| test.c:7:24:7:26 | 100 | test.c:7:17:7:22 | call to malloc | test.c:9:5:9:11 | access to array | test.c:9:9:9:10 | 99 | 99.0 | 100 | | char | 1 | 1 | -| test.c:7:24:7:26 | 100 | test.c:7:17:7:22 | call to malloc | test.c:10:5:10:12 | access to array | test.c:10:9:10:11 | 100 | 100.0 | 100 | | char | 1 | 1 | -| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:17:5:17:10 | access to array | test.c:17:9:17:9 | 0 | 0.0 | 100 | | char | 1 | 1 | -| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:18:5:18:11 | access to array | test.c:18:9:18:10 | 99 | 99.0 | 100 | | char | 1 | 1 | - - - + + + +++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
test.c:7:24:7:26100test.c:7:17:7:22call to malloctest.c:8:5:8:10access to arraytest.c:8:9:8:900.0100file://:0:0:0:0char11
test.c:7:24:7:26100test.c:7:17:7:22call to malloctest.c:9:5:9:11access to arraytest.c:9:9:9:109999.0100file://:0:0:0:0char11
test.c:7:24:7:26100test.c:7:17:7:22call to malloctest.c:10:5:10:12access to arraytest.c:10:9:10:11100100.0100file://:0:0:0:0char11
test.c:15:26:15:28100test.c:16:17:16:22call to malloctest.c:17:5:17:10access to arraytest.c:17:9:17:900.0100file://:0:0:0:0char11
test.c:15:26:15:28100test.c:16:17:16:22call to malloctest.c:18:5:18:11access to arraytest.c:18:9:18:109999.0100file://:0:0:0:0char11
+ + + ## Step 7 1. Clean up the query. 2. Compare buffer allocation size to the access index. -3. Add expressions for `allocatedUnits` (from the malloc) and a `maxAccessedIndex` (from array accesses) +3. Add expressions for `allocatedUnits` (from the malloc) and a + `maxAccessedIndex` (from array accesses) 1. Calculate the `accessOffset` / `maxAccessedIndex` (from array accesses) 2. Calculate the `allocSize` / `allocatedUnits` (from the malloc) 3. Compare them - + ### Solution -```java -import cpp -import semmle.code.cpp.dataflow.DataFlow -import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis - -from - AllocationExpr buffer, ArrayExpr access, Expr accessIdx, int bufferSize, Expr bufferSizeExpr, - int arrayTypeSize, int allocBaseSize -where - // malloc (100) - // ^^^^^^^^^^^^ AllocationExpr buffer - // buf[...] - // ^^^ ArrayExpr access - // buf[...] - // ^^^ int accessIdx - accessIdx = access.getArrayOffset() and - // - // malloc (100) - // ^^^ allocSizeExpr / bufferSize - // - getAllocConstantExpr(bufferSizeExpr, bufferSize) and - // Ensure buffer access is to the correct allocation. - DataFlow::localExprFlow(buffer, access.getArrayBase()) and - // Ensure use refers to the correct size defintion, even for non-constant - // expressions. - DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and - // - arrayTypeSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() and - 1 = allocBaseSize -// -select bufferSizeExpr, buffer, access, accessIdx, upperBound(accessIdx) as accessMax, bufferSize, - access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType() as arrayBaseType, - allocBaseSize * bufferSize as allocatedUnits, arrayTypeSize * accessMax as maxAccessedIndex - -/** - * Gets an expression that flows to the allocation (which includes those already in the allocation) - * and has a constant value. - */ -predicate getAllocConstantExpr(Expr bufferSizeExpr, int bufferSize) { - exists(AllocationExpr buffer | - // Capture BOTH with datflow: - // 1. - // malloc (100) - // ^^^ allocSizeExpr / bufferSize - // 2. - // unsigned long size = 100; ... ; char *buf = malloc(size); - DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and - bufferSizeExpr.getValue().toInt() = bufferSize - ) -} -``` + import cpp + import semmle.code.cpp.dataflow.DataFlow + import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis + + from + AllocationExpr buffer, ArrayExpr access, Expr accessIdx, int bufferSize, Expr bufferSizeExpr, + int arrayTypeSize, int allocBaseSize + where + // malloc (100) + // ^^^^^^^^^^^^ AllocationExpr buffer + // buf[...] + // ^^^ ArrayExpr access + // buf[...] + // ^^^ int accessIdx + accessIdx = access.getArrayOffset() and + // + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // + getAllocConstantExpr(bufferSizeExpr, bufferSize) and + // Ensure buffer access is to the correct allocation. + DataFlow::localExprFlow(buffer, access.getArrayBase()) and + // Ensure use refers to the correct size defintion, even for non-constant + // expressions. + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + // + arrayTypeSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() and + 1 = allocBaseSize + // + select bufferSizeExpr, buffer, access, accessIdx, upperBound(accessIdx) as accessMax, bufferSize, + access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType() as arrayBaseType, + allocBaseSize * bufferSize as allocatedUnits, arrayTypeSize * accessMax as maxAccessedIndex + + /** + * Gets an expression that flows to the allocation (which includes those already in the allocation) + * and has a constant value. + */ + predicate getAllocConstantExpr(Expr bufferSizeExpr, int bufferSize) { + exists(AllocationExpr buffer | + // Capture BOTH with datflow: + // 1. + // malloc (100) + // ^^^ allocSizeExpr / bufferSize + // 2. + // unsigned long size = 100; ... ; char *buf = malloc(size); + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + bufferSizeExpr.getValue().toInt() = bufferSize + ) + } - + ### First 5 results -| test.c:7:24:7:26 | 100 | test.c:7:17:7:22 | call to malloc | test.c:8:5:8:10 | access to array | test.c:8:9:8:9 | 0 | 0.0 | 100 | | char | 100 | 0.0 | -| test.c:7:24:7:26 | 100 | test.c:7:17:7:22 | call to malloc | test.c:9:5:9:11 | access to array | test.c:9:9:9:10 | 99 | 99.0 | 100 | | char | 100 | 99.0 | -| test.c:7:24:7:26 | 100 | test.c:7:17:7:22 | call to malloc | test.c:10:5:10:12 | access to array | test.c:10:9:10:11 | 100 | 100.0 | 100 | | char | 100 | 100.0 | -| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:17:5:17:10 | access to array | test.c:17:9:17:9 | 0 | 0.0 | 100 | | char | 100 | 0.0 | -| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:18:5:18:11 | access to array | test.c:18:9:18:10 | 99 | 99.0 | 100 | | char | 100 | 99.0 | - - - + + + +++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
test.c:7:24:7:26100test.c:7:17:7:22call to malloctest.c:8:5:8:10access to arraytest.c:8:9:8:900.0100file://:0:0:0:0char1000.0
test.c:7:24:7:26100test.c:7:17:7:22call to malloctest.c:9:5:9:11access to arraytest.c:9:9:9:109999.0100file://:0:0:0:0char10099.0
test.c:7:24:7:26100test.c:7:17:7:22call to malloctest.c:10:5:10:12access to arraytest.c:10:9:10:11100100.0100file://:0:0:0:0char100100.0
test.c:15:26:15:28100test.c:16:17:16:22call to malloctest.c:17:5:17:10access to arraytest.c:17:9:17:900.0100file://:0:0:0:0char1000.0
test.c:15:26:15:28100test.c:16:17:16:22call to malloctest.c:18:5:18:11access to arraytest.c:18:9:18:109999.0100file://:0:0:0:0char10099.0
+ + + ## Step 7a @@ -796,72 +1178,202 @@ predicate getAllocConstantExpr(Expr bufferSizeExpr, int bufferSize) { 2. Put all expressions into the select for review. - + ### Solution -```java -import cpp -import semmle.code.cpp.dataflow.DataFlow -import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis - -from - AllocationExpr buffer, ArrayExpr access, Expr accessIdx, int bufferSize, Expr bufferSizeExpr, - int arrayTypeSize, int allocBaseSize -where - // malloc (100) - // ^^^^^^^^^^^^ AllocationExpr buffer - // buf[...] - // ^^^^^^^^ ArrayExpr access - // ^^^ int accessIdx - accessIdx = access.getArrayOffset() and - getAllocConstantExpr(bufferSizeExpr, bufferSize) and - // Ensure buffer access is to the correct allocation. - DataFlow::localExprFlow(buffer, access.getArrayBase()) and - // Ensure use refers to the correct size defintion, even for non-constant - // expressions. - DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and - // - arrayTypeSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() and - 1 = allocBaseSize -// -select bufferSizeExpr, buffer, access, accessIdx, upperBound(accessIdx) as accessMax, bufferSize, - access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType() as arrayBaseType, - buffer.getSizeMult() as bufferBaseTypeSize, - arrayBaseType.getSize() as arrayBaseTypeSize, - allocBaseSize * bufferSize as allocatedUnits, arrayTypeSize * accessMax as maxAccessedIndex - -/** - * Gets an expression that flows to the allocation (which includes those already in the allocation) - * and has a constant value. - */ -predicate getAllocConstantExpr(Expr bufferSizeExpr, int bufferSize) { - exists(AllocationExpr buffer | - // Capture BOTH with datflow: - // 1. - // malloc (100) - // ^^^ bufferSize - // 2. - // unsigned long size = 100; ... ; char *buf = malloc(size); - DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and - bufferSizeExpr.getValue().toInt() = bufferSize - ) -} -``` + import cpp + import semmle.code.cpp.dataflow.DataFlow + import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis + + from + AllocationExpr buffer, ArrayExpr access, Expr accessIdx, int bufferSize, Expr bufferSizeExpr, + int arrayTypeSize, int allocBaseSize + where + // malloc (100) + // ^^^^^^^^^^^^ AllocationExpr buffer + // buf[...] + // ^^^^^^^^ ArrayExpr access + // ^^^ int accessIdx + accessIdx = access.getArrayOffset() and + getAllocConstantExpr(bufferSizeExpr, bufferSize) and + // Ensure buffer access is to the correct allocation. + DataFlow::localExprFlow(buffer, access.getArrayBase()) and + // Ensure use refers to the correct size defintion, even for non-constant + // expressions. + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + // + arrayTypeSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() and + 1 = allocBaseSize + // + select bufferSizeExpr, buffer, access, accessIdx, upperBound(accessIdx) as accessMax, bufferSize, + access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType() as arrayBaseType, + buffer.getSizeMult() as bufferBaseTypeSize, + arrayBaseType.getSize() as arrayBaseTypeSize, + allocBaseSize * bufferSize as allocatedUnits, arrayTypeSize * accessMax as maxAccessedIndex + + /** + * Gets an expression that flows to the allocation (which includes those already in the allocation) + * and has a constant value. + */ + predicate getAllocConstantExpr(Expr bufferSizeExpr, int bufferSize) { + exists(AllocationExpr buffer | + // Capture BOTH with datflow: + // 1. + // malloc (100) + // ^^^ bufferSize + // 2. + // unsigned long size = 100; ... ; char *buf = malloc(size); + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + bufferSizeExpr.getValue().toInt() = bufferSize + ) + } - + ### First 5 results -| test.c:7:24:7:26 | 100 | test.c:7:17:7:22 | call to malloc | test.c:8:5:8:10 | access to array | test.c:8:9:8:9 | 0 | 0.0 | 100 | | char | 1 | 1 | 100 | 0.0 | -| test.c:7:24:7:26 | 100 | test.c:7:17:7:22 | call to malloc | test.c:9:5:9:11 | access to array | test.c:9:9:9:10 | 99 | 99.0 | 100 | | char | 1 | 1 | 100 | 99.0 | -| test.c:7:24:7:26 | 100 | test.c:7:17:7:22 | call to malloc | test.c:10:5:10:12 | access to array | test.c:10:9:10:11 | 100 | 100.0 | 100 | | char | 1 | 1 | 100 | 100.0 | -| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:17:5:17:10 | access to array | test.c:17:9:17:9 | 0 | 0.0 | 100 | | char | 1 | 1 | 100 | 0.0 | -| test.c:15:26:15:28 | 100 | test.c:16:17:16:22 | call to malloc | test.c:18:5:18:11 | access to array | test.c:18:9:18:10 | 99 | 99.0 | 100 | | char | 1 | 1 | 100 | 99.0 | - - - + + + +++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
test.c:7:24:7:26100test.c:7:17:7:22call to malloctest.c:8:5:8:10access to arraytest.c:8:9:8:900.0100file://:0:0:0:0char111000.0
test.c:7:24:7:26100test.c:7:17:7:22call to malloctest.c:9:5:9:11access to arraytest.c:9:9:9:109999.0100file://:0:0:0:0char1110099.0
test.c:7:24:7:26100test.c:7:17:7:22call to malloctest.c:10:5:10:12access to arraytest.c:10:9:10:11100100.0100file://:0:0:0:0char11100100.0
test.c:15:26:15:28100test.c:16:17:16:22call to malloctest.c:17:5:17:10access to arraytest.c:17:9:17:900.0100file://:0:0:0:0char111000.0
test.c:15:26:15:28100test.c:16:17:16:22call to malloctest.c:18:5:18:11access to arraytest.c:18:9:18:109999.0100file://:0:0:0:0char1110099.0
+ + + ## Step 7b @@ -870,280 +1382,429 @@ predicate getAllocConstantExpr(Expr bufferSizeExpr, int bufferSize) { 3. Report only the questionable entries. - + ### Solution -```java -import cpp -import semmle.code.cpp.dataflow.DataFlow -import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis - -from - AllocationExpr buffer, ArrayExpr access, int bufferSize, Expr bufferSizeExpr, - int maxAccessedIndex, int allocatedUnits -where - // malloc (100) - // ^^^^^^^^^^^^ AllocationExpr buffer - getAllocConstantExpr(bufferSizeExpr, bufferSize) and - // Ensure buffer access is to the correct allocation. - DataFlow::localExprFlow(buffer, access.getArrayBase()) and - // Ensure use refers to the correct size defintion, even for non-constant - // expressions. - DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and - // computeIndices(access, buffer, bufferSize, allocatedUnits, maxAccessedIndex) - computeAllocationSize(buffer, bufferSize, allocatedUnits) and - computeMaxAccess(access, maxAccessedIndex) - // only consider out-of-bounds - and - maxAccessedIndex >= allocatedUnits -select access, - "Array access at or beyond size; have " + allocatedUnits + " units, access at " + maxAccessedIndex - -// select bufferSizeExpr, buffer, access, allocatedUnits, maxAccessedIndex - -/** - * Compute the maximum accessed index. - */ -predicate computeMaxAccess(ArrayExpr access, int maxAccessedIndex) { - exists( - int arrayTypeSize, int accessMax, Type arrayBaseType, int arrayBaseTypeSize, Expr accessIdx - | - // buf[...] - // ^^^^^^^^ ArrayExpr access - // ^^^ - accessIdx = access.getArrayOffset() and - upperBound(accessIdx) = accessMax and - arrayBaseType.getSize() = arrayBaseTypeSize and - access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType() = arrayBaseType and - arrayTypeSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() and - arrayTypeSize * accessMax = maxAccessedIndex - ) -} - -/** - * Compute the allocation size. - */ -bindingset[bufferSize] -predicate computeAllocationSize(AllocationExpr buffer, int bufferSize, int allocatedUnits) { - exists(int bufferBaseTypeSize, Type arrayBaseType, int arrayBaseTypeSize | - // buf[...] - // ^^^^^^^^ ArrayExpr access - // ^^^ - buffer.getSizeMult() = bufferBaseTypeSize and - arrayBaseType.getSize() = arrayBaseTypeSize and - bufferSize * bufferBaseTypeSize = allocatedUnits - ) -} - -/** - * Compute the allocation size and the maximum accessed index for the allocation and access. - */ -bindingset[bufferSize] -predicate computeIndices( - ArrayExpr access, AllocationExpr buffer, int bufferSize, int allocatedUnits, int maxAccessedIndex -) { - exists( - int arrayTypeSize, int accessMax, int bufferBaseTypeSize, Type arrayBaseType, - int arrayBaseTypeSize, Expr accessIdx - | - // buf[...] - // ^^^^^^^^ ArrayExpr access - // ^^^ - accessIdx = access.getArrayOffset() and - upperBound(accessIdx) = accessMax and - buffer.getSizeMult() = bufferBaseTypeSize and - arrayBaseType.getSize() = arrayBaseTypeSize and - bufferSize * bufferBaseTypeSize = allocatedUnits and - access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType() = arrayBaseType and - arrayTypeSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() and - arrayTypeSize * accessMax = maxAccessedIndex - ) -} - -/** - * Gets an expression that flows to the allocation (which includes those already in the allocation) - * and has a constant value. - */ -predicate getAllocConstantExpr(Expr bufferSizeExpr, int bufferSize) { - exists(AllocationExpr buffer | - // Capture BOTH with datflow: - // 1. malloc (100) - // ^^^ bufferSize - // 2. unsigned long size = 100; ... ; char *buf = malloc(size); - DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and - bufferSizeExpr.getValue().toInt() = bufferSize - ) -} -``` + import cpp + import semmle.code.cpp.dataflow.DataFlow + import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis + + from + AllocationExpr buffer, ArrayExpr access, int bufferSize, Expr bufferSizeExpr, + int maxAccessedIndex, int allocatedUnits + where + // malloc (100) + // ^^^^^^^^^^^^ AllocationExpr buffer + getAllocConstantExpr(bufferSizeExpr, bufferSize) and + // Ensure buffer access is to the correct allocation. + DataFlow::localExprFlow(buffer, access.getArrayBase()) and + // Ensure use refers to the correct size defintion, even for non-constant + // expressions. + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + // computeIndices(access, buffer, bufferSize, allocatedUnits, maxAccessedIndex) + computeAllocationSize(buffer, bufferSize, allocatedUnits) and + computeMaxAccess(access, maxAccessedIndex) + // only consider out-of-bounds + and + maxAccessedIndex >= allocatedUnits + select access, + "Array access at or beyond size; have " + allocatedUnits + " units, access at " + maxAccessedIndex + + // select bufferSizeExpr, buffer, access, allocatedUnits, maxAccessedIndex + + /** + * Compute the maximum accessed index. + */ + predicate computeMaxAccess(ArrayExpr access, int maxAccessedIndex) { + exists( + int arrayTypeSize, int accessMax, Type arrayBaseType, int arrayBaseTypeSize, Expr accessIdx + | + // buf[...] + // ^^^^^^^^ ArrayExpr access + // ^^^ + accessIdx = access.getArrayOffset() and + upperBound(accessIdx) = accessMax and + arrayBaseType.getSize() = arrayBaseTypeSize and + access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType() = arrayBaseType and + arrayTypeSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() and + arrayTypeSize * accessMax = maxAccessedIndex + ) + } + + /** + * Compute the allocation size. + */ + bindingset[bufferSize] + predicate computeAllocationSize(AllocationExpr buffer, int bufferSize, int allocatedUnits) { + exists(int bufferBaseTypeSize, Type arrayBaseType, int arrayBaseTypeSize | + // buf[...] + // ^^^^^^^^ ArrayExpr access + // ^^^ + buffer.getSizeMult() = bufferBaseTypeSize and + arrayBaseType.getSize() = arrayBaseTypeSize and + bufferSize * bufferBaseTypeSize = allocatedUnits + ) + } + + /** + * Compute the allocation size and the maximum accessed index for the allocation and access. + */ + bindingset[bufferSize] + predicate computeIndices( + ArrayExpr access, AllocationExpr buffer, int bufferSize, int allocatedUnits, int maxAccessedIndex + ) { + exists( + int arrayTypeSize, int accessMax, int bufferBaseTypeSize, Type arrayBaseType, + int arrayBaseTypeSize, Expr accessIdx + | + // buf[...] + // ^^^^^^^^ ArrayExpr access + // ^^^ + accessIdx = access.getArrayOffset() and + upperBound(accessIdx) = accessMax and + buffer.getSizeMult() = bufferBaseTypeSize and + arrayBaseType.getSize() = arrayBaseTypeSize and + bufferSize * bufferBaseTypeSize = allocatedUnits and + access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType() = arrayBaseType and + arrayTypeSize = access.getArrayBase().getUnspecifiedType().(PointerType).getBaseType().getSize() and + arrayTypeSize * accessMax = maxAccessedIndex + ) + } + + /** + * Gets an expression that flows to the allocation (which includes those already in the allocation) + * and has a constant value. + */ + predicate getAllocConstantExpr(Expr bufferSizeExpr, int bufferSize) { + exists(AllocationExpr buffer | + // Capture BOTH with datflow: + // 1. malloc (100) + // ^^^ bufferSize + // 2. unsigned long size = 100; ... ; char *buf = malloc(size); + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + bufferSizeExpr.getValue().toInt() = bufferSize + ) + } - + ### First 5 results WARNING: Unused predicate computeIndices (/Users/hohn/local/codeql-workshop-runtime-values-c/session/example7b.ql:66,11-25) -| test.c:10:5:10:12 | access to array | Array access at or beyond size; have 100 units, access at 100 | -| test.c:20:5:20:12 | access to array | Array access at or beyond size; have 100 units, access at 100 | -| test.c:21:5:21:13 | access to array | Array access at or beyond size; have 100 units, access at 100 | -| test.c:37:5:37:17 | access to array | Array access at or beyond size; have 100 units, access at 299 | + + + +++ ++ ++ + + + + + + + + + + + + + - + + + + + + + + + + + + + +
test.c:10:5:10:12access to arrayArray access at or beyond size; have 100 units, access at 100
test.c:20:5:20:12access to arrayArray access at or beyond size; have 100 units, access at 100
test.c:21:5:21:13access to arrayArray access at or beyond size; have 100 units, access at 100
test.c:37:5:37:17access to arrayArray access at or beyond size; have 100 units, access at 299
+ + + ## Step 8 Up to now, we have dealt with constant values -```c++ -char *buf = malloc(100); -buf[0]; // COMPLIANT -``` + char *buf = malloc(100); + buf[0]; // COMPLIANT or -```c++ -unsigned long size = 100; -char *buf = malloc(size); -buf[0]; // COMPLIANT -``` + unsigned long size = 100; + char *buf = malloc(size); + buf[0]; // COMPLIANT and statically determinable or boundable values -```c++ -char *buf = malloc(size); -if (size < 199) - { - buf[size]; // COMPLIANT - // ... - } -``` + char *buf = malloc(size); + if (size < 199) + { + buf[size]; // COMPLIANT + // ... + } -There is another statically determinable case. Examples are +There is another statically determinable case. Examples are 1. A simple expression - ```c++ - char *buf = malloc(alloc_size); - // ... - buf[alloc_size - 1]; // COMPLIANT - buf[alloc_size]; // NON_COMPLIANT - ``` + char *buf = malloc(alloc_size); + // ... + buf[alloc_size - 1]; // COMPLIANT + buf[alloc_size]; // NON_COMPLIANT 2. A complex expression - ```c++ - char *buf = malloc(sz * x * y); - buf[sz * x * y - 1]; // COMPLIANT - ``` + char *buf = malloc(sz * x * y); + buf[sz * x * y - 1]; // COMPLIANT -These both have the form `malloc(e)`, `buf[e+c]`, where `e` is an `Expr` and `c` is a constant, possibly 0. Our existing queries only report known or boundable results, but here `e` is neither. +These both have the form `malloc(e)`, `buf[e+c]`, where `e` is an `Expr` and +`c` is a constant, possibly 0. Our existing queries only report known or +boundable results, but here `e` is neither. -Write a new query, re-using or modifying the existing one to handle the simple expression (case 1). +Write a new query, re-using or modifying the existing one to handle the simple +expression (case 1). Note: - We are looking at the allocation expression again, not its possible value. -- This only handles very specific cases. Constructing counterexamples is easy. +- This only handles very specific cases. Constructing counterexamples is easy. - We will address this in the next section. - + ### Solution -```java -import cpp -import semmle.code.cpp.dataflow.DataFlow -import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis - -from - AllocationExpr buffer, ArrayExpr access, Expr bufferSizeExpr, - // --- - // int maxAccessedIndex, int allocatedUnits, - // int bufferSize - int accessOffset, Expr accessBase, Expr bufferBase, int bufferOffset, Variable bufInit, - Variable accessInit -where - // malloc (...) - // ^^^^^^^^^^^^ AllocationExpr buffer - // --- - // getAllocConstExpr(...) - // +++ - bufferSizeExpr = buffer.getSizeExpr() and - // Ensure buffer access refers to the matching allocation - DataFlow::localExprFlow(buffer, access.getArrayBase()) and - // Ensure buffer access refers to the matching allocation - DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and - // - // +++ - // base+offset - extractBaseAndOffset(bufferSizeExpr, bufferBase, bufferOffset) and - extractBaseAndOffset(access.getArrayOffset(), accessBase, accessOffset) and - // +++ - // Same initializer variable - bufferBase.(VariableAccess).getTarget() = bufInit and - accessBase.(VariableAccess).getTarget() = accessInit and - bufInit = accessInit -// +++ -// Identify questionable differences -select buffer, bufferBase, bufferOffset, access, accessBase, accessOffset, bufInit, accessInit - -/** - * Extract base and offset from y = base+offset and y = base-offset. For others, get y and 0. - * - * For cases like - * buf[alloc_size + 1]; - * - * The more general - * buf[sz * x * y - 1]; - * requires other tools. - */ -bindingset[expr] -predicate extractBaseAndOffset(Expr expr, Expr base, int offset) { - offset = expr.(AddExpr).getRightOperand().getValue().toInt() and - base = expr.(AddExpr).getLeftOperand() - or - offset = -expr.(SubExpr).getRightOperand().getValue().toInt() and - base = expr.(SubExpr).getLeftOperand() - or - not expr instanceof AddExpr and - not expr instanceof SubExpr and - base = expr and - offset = 0 -} -``` - - - + import cpp + import semmle.code.cpp.dataflow.DataFlow + import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis + + from + AllocationExpr buffer, ArrayExpr access, Expr bufferSizeExpr, + // --- + // int maxAccessedIndex, int allocatedUnits, + // int bufferSize + int accessOffset, Expr accessBase, Expr bufferBase, int bufferOffset, Variable bufInit, + Variable accessInit + where + // malloc (...) + // ^^^^^^^^^^^^ AllocationExpr buffer + // --- + // getAllocConstExpr(...) + // +++ + bufferSizeExpr = buffer.getSizeExpr() and + // Ensure buffer access refers to the matching allocation + DataFlow::localExprFlow(buffer, access.getArrayBase()) and + // Ensure buffer access refers to the matching allocation + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + // + // +++ + // base+offset + extractBaseAndOffset(bufferSizeExpr, bufferBase, bufferOffset) and + extractBaseAndOffset(access.getArrayOffset(), accessBase, accessOffset) and + // +++ + // Same initializer variable + bufferBase.(VariableAccess).getTarget() = bufInit and + accessBase.(VariableAccess).getTarget() = accessInit and + bufInit = accessInit + // +++ + // Identify questionable differences + select buffer, bufferBase, bufferOffset, access, accessBase, accessOffset, bufInit, accessInit + + /** + * Extract base and offset from y = base+offset and y = base-offset. For others, get y and 0. + * + * For cases like + * buf[alloc_size + 1]; + * + * The more general + * buf[sz * x * y - 1]; + * requires other tools. + */ + bindingset[expr] + predicate extractBaseAndOffset(Expr expr, Expr base, int offset) { + offset = expr.(AddExpr).getRightOperand().getValue().toInt() and + base = expr.(AddExpr).getLeftOperand() + or + offset = -expr.(SubExpr).getRightOperand().getValue().toInt() and + base = expr.(SubExpr).getLeftOperand() + or + not expr instanceof AddExpr and + not expr instanceof SubExpr and + base = expr and + offset = 0 + } -### First 5 results -| test.c:16:17:16:22 | call to malloc | test.c:16:24:16:27 | size | 0 | test.c:19:5:19:17 | access to array | test.c:19:9:19:12 | size | -1 | test.c:15:19:15:22 | size | test.c:15:19:15:22 | size | -| test.c:16:17:16:22 | call to malloc | test.c:16:24:16:27 | size | 0 | test.c:21:5:21:13 | access to array | test.c:21:9:21:12 | size | 0 | test.c:15:19:15:22 | size | test.c:15:19:15:22 | size | -| test.c:28:17:28:22 | call to malloc | test.c:28:24:28:27 | size | 0 | test.c:37:5:37:17 | access to array | test.c:37:9:37:12 | size | -1 | test.c:26:19:26:22 | size | test.c:26:19:26:22 | size | -| test.c:28:17:28:22 | call to malloc | test.c:28:24:28:27 | size | 0 | test.c:39:5:39:13 | access to array | test.c:39:9:39:12 | size | 0 | test.c:26:19:26:22 | size | test.c:26:19:26:22 | size | -| test.c:28:17:28:22 | call to malloc | test.c:28:24:28:27 | size | 0 | test.c:43:9:43:17 | access to array | test.c:43:13:43:16 | size | 0 | test.c:26:19:26:22 | size | test.c:26:19:26:22 | size | + +### First 5 results - + + + +++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
test.c:16:17:16:22call to malloctest.c:16:24:16:27size0test.c:19:5:19:17access to arraytest.c:19:9:19:12size-1test.c:15:19:15:22sizetest.c:15:19:15:22size
test.c:16:17:16:22call to malloctest.c:16:24:16:27size0test.c:21:5:21:13access to arraytest.c:21:9:21:12size0test.c:15:19:15:22sizetest.c:15:19:15:22size
test.c:28:17:28:22call to malloctest.c:28:24:28:27size0test.c:37:5:37:17access to arraytest.c:37:9:37:12size-1test.c:26:19:26:22sizetest.c:26:19:26:22size
test.c:28:17:28:22call to malloctest.c:28:24:28:27size0test.c:39:5:39:13access to arraytest.c:39:9:39:12size0test.c:26:19:26:22sizetest.c:26:19:26:22size
test.c:28:17:28:22call to malloctest.c:28:24:28:27size0test.c:43:9:43:17access to arraytest.c:43:13:43:16size0test.c:26:19:26:22sizetest.c:26:19:26:22size
+ + + ## Interim notes -A common issue with the `SimpleRangeAnalysis` library is handling of cases where the bounds are undeterminable at compile-time on one or more paths. For example, even though certain paths have clearly defined bounds, the range analysis library will define the `upperBound` and `lowerBound` of `val` as `INT_MIN` and `INT_MAX` respectively: +A common issue with the `SimpleRangeAnalysis` library is handling of +cases where the bounds are undeterminable at compile-time on one or more +paths. For example, even though certain paths have clearly defined +bounds, the range analysis library will define the `upperBound` and +`lowerBound` of `val` as `INT_MIN` and `INT_MAX` respectively: -```cpp -int val = rand() ? rand() : 30; -``` + int val = rand() ? rand() : 30; -A similar case is present in the `test_const_branch` and `test_const_branch2` test-cases. In these cases, it is necessary to augment range analysis with data-flow and restrict the bounds to the upper or lower bound of computable constants that flow to a given expression. Another approach is global value numbering, used next. +A similar case is present in the `test_const_branch` and `test_const_branch2` +test-cases. In these cases, it is necessary to augment range analysis with +data-flow and restrict the bounds to the upper or lower bound of computable +constants that flow to a given expression. Another approach is global value +numbering, used next. - + ## Step 8a -Find problematic accesses by reverting to some *simple* `var+const` checks using `accessOffset` and `bufferOffset`. +Find problematic accesses by reverting to some *simple* `var+const` checks using +`accessOffset` and `bufferOffset`. Note: @@ -1153,95 +1814,212 @@ Note: These are addressed in the next step. - + ### Solution -```java -import cpp -import semmle.code.cpp.dataflow.DataFlow -import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis - -from - AllocationExpr buffer, ArrayExpr access, Expr bufferSizeExpr, - // --- - // int maxAccessedIndex, int allocatedUnits, - // int bufferSize - int accessOffset, Expr accessBase, Expr bufferBase, int bufferOffset, Variable bufInit, - Variable accessInit -where - // malloc (...) - // ^^^^^^^^^^^^ AllocationExpr buffer - // --- - // getAllocConstExpr(...) - // +++ - bufferSizeExpr = buffer.getSizeExpr() and - // Ensure buffer access refers to the matching allocation - DataFlow::localExprFlow(buffer, access.getArrayBase()) and - // Find allocation size expression flowing to buffer. - DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and - // - // +++ - // base+offset - extractBaseAndOffset(bufferSizeExpr, bufferBase, bufferOffset) and - extractBaseAndOffset(access.getArrayOffset(), accessBase, accessOffset) and - // +++ - // Same initializer variable - bufferBase.(VariableAccess).getTarget() = bufInit and - accessBase.(VariableAccess).getTarget() = accessInit and - bufInit = accessInit and - // +++ - // Identify questionable differences - accessOffset >= bufferOffset -select buffer, bufferBase, access, accessBase, bufInit, bufferOffset, accessInit, accessOffset - -/** - * Extract base and offset from y = base+offset and y = base-offset. For others, get y and 0. - * - * For cases like - * buf[alloc_size + 1]; - * ^^^^^^^^^^^^^^ expr - * ^^^^^^^^^^ base - * ^^^ offset - * - * The more general - * buf[sz * x * y - 1]; - * requires other tools. - */ -bindingset[expr] -predicate extractBaseAndOffset(Expr expr, Expr base, int offset) { - offset = expr.(AddExpr).getRightOperand().getValue().toInt() and - base = expr.(AddExpr).getLeftOperand() - or - offset = -expr.(SubExpr).getRightOperand().getValue().toInt() and - base = expr.(SubExpr).getLeftOperand() - or - not expr instanceof AddExpr and - not expr instanceof SubExpr and - base = expr and - offset = 0 -} -``` - - - + import cpp + import semmle.code.cpp.dataflow.DataFlow + import semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis + + from + AllocationExpr buffer, ArrayExpr access, Expr bufferSizeExpr, + // --- + // int maxAccessedIndex, int allocatedUnits, + // int bufferSize + int accessOffset, Expr accessBase, Expr bufferBase, int bufferOffset, Variable bufInit, + Variable accessInit + where + // malloc (...) + // ^^^^^^^^^^^^ AllocationExpr buffer + // --- + // getAllocConstExpr(...) + // +++ + bufferSizeExpr = buffer.getSizeExpr() and + // Ensure buffer access refers to the matching allocation + DataFlow::localExprFlow(buffer, access.getArrayBase()) and + // Find allocation size expression flowing to buffer. + DataFlow::localExprFlow(bufferSizeExpr, buffer.getSizeExpr()) and + // + // +++ + // base+offset + extractBaseAndOffset(bufferSizeExpr, bufferBase, bufferOffset) and + extractBaseAndOffset(access.getArrayOffset(), accessBase, accessOffset) and + // +++ + // Same initializer variable + bufferBase.(VariableAccess).getTarget() = bufInit and + accessBase.(VariableAccess).getTarget() = accessInit and + bufInit = accessInit and + // +++ + // Identify questionable differences + accessOffset >= bufferOffset + select buffer, bufferBase, access, accessBase, bufInit, bufferOffset, accessInit, accessOffset + + /** + * Extract base and offset from y = base+offset and y = base-offset. For others, get y and 0. + * + * For cases like + * buf[alloc_size + 1]; + * ^^^^^^^^^^^^^^ expr + * ^^^^^^^^^^ base + * ^^^ offset + * + * The more general + * buf[sz * x * y - 1]; + * requires other tools. + */ + bindingset[expr] + predicate extractBaseAndOffset(Expr expr, Expr base, int offset) { + offset = expr.(AddExpr).getRightOperand().getValue().toInt() and + base = expr.(AddExpr).getLeftOperand() + or + offset = -expr.(SubExpr).getRightOperand().getValue().toInt() and + base = expr.(SubExpr).getLeftOperand() + or + not expr instanceof AddExpr and + not expr instanceof SubExpr and + base = expr and + offset = 0 + } -### First 5 results -| test.c:16:17:16:22 | call to malloc | test.c:16:24:16:27 | size | test.c:21:5:21:13 | access to array | test.c:21:9:21:12 | size | test.c:15:19:15:22 | size | 0 | test.c:15:19:15:22 | size | 0 | -| test.c:28:17:28:22 | call to malloc | test.c:28:24:28:27 | size | test.c:39:5:39:13 | access to array | test.c:39:9:39:12 | size | test.c:26:19:26:22 | size | 0 | test.c:26:19:26:22 | size | 0 | -| test.c:28:17:28:22 | call to malloc | test.c:28:24:28:27 | size | test.c:43:9:43:17 | access to array | test.c:43:13:43:16 | size | test.c:26:19:26:22 | size | 0 | test.c:26:19:26:22 | size | 0 | -| test.c:28:17:28:22 | call to malloc | test.c:28:24:28:27 | size | test.c:44:9:44:21 | access to array | test.c:44:13:44:16 | size | test.c:26:19:26:22 | size | 0 | test.c:26:19:26:22 | size | 1 | -| test.c:28:17:28:22 | call to malloc | test.c:28:24:28:27 | size | test.c:45:9:45:21 | access to array | test.c:45:13:45:16 | size | test.c:26:19:26:22 | size | 0 | test.c:26:19:26:22 | size | 2 | + +### First 5 results - + + + +++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
test.c:16:17:16:22call to malloctest.c:16:24:16:27sizetest.c:21:5:21:13access to arraytest.c:21:9:21:12sizetest.c:15:19:15:22size0test.c:15:19:15:22size0
test.c:28:17:28:22call to malloctest.c:28:24:28:27sizetest.c:39:5:39:13access to arraytest.c:39:9:39:12sizetest.c:26:19:26:22size0test.c:26:19:26:22size0
test.c:28:17:28:22call to malloctest.c:28:24:28:27sizetest.c:43:9:43:17access to arraytest.c:43:13:43:16sizetest.c:26:19:26:22size0test.c:26:19:26:22size0
test.c:28:17:28:22call to malloctest.c:28:24:28:27sizetest.c:44:9:44:21access to arraytest.c:44:13:44:16sizetest.c:26:19:26:22size0test.c:26:19:26:22size1
test.c:28:17:28:22call to malloctest.c:28:24:28:27sizetest.c:45:9:45:21access to arraytest.c:45:13:45:16sizetest.c:26:19:26:22size0test.c:26:19:26:22size2
+ + + ## Step 9 – Global Value Numbering -Range analyis won't bound `sz * x * y`, and simple equality checks don't work at the structure level, so switch to global value numbering. +Range analyis won't bound `sz * x * y`, and simple equality checks don't work +at the structure level, so switch to global value numbering. -This is the case in the last test case, +This is the case in the last test case, void test_gvn_var(unsigned long x, unsigned long y, unsigned long sz) { @@ -1251,98 +2029,214 @@ This is the case in the last test case, buf[sz * x * y + 1]; // NON_COMPLIANT } -Reference: +Reference: + -Global value numbering only knows that runtime values are equal; they are not comparable (`<, >, <=` etc.), and the *actual* value is not known. +Global value numbering only knows that runtime values are equal; they +are not comparable (`<, >, <=` etc.), and the *actual* value is not +known. -Global value numbering finds expressions with the same known value, independent of structure. +Global value numbering finds expressions with the same known value, +independent of structure. -So, we look for and use *relative* values between allocation and use. +So, we look for and use *relative* values between allocation and use. The relevant CodeQL constructs are -```java -import semmle.code.cpp.valuenumbering.GlobalValueNumbering -... -globalValueNumber(e) = globalValueNumber(sizeExpr) and -e != sizeExpr -... -``` + import semmle.code.cpp.valuenumbering.GlobalValueNumbering + ... + globalValueNumber(e) = globalValueNumber(sizeExpr) and + e != sizeExpr + ... -We can use global value numbering to identify common values as first step, but for expressions like +We can use global value numbering to identify common values as first step, but +for expressions like buf[sz * x * y - 1]; // COMPLIANT we have to "evaluate" the expressions – or at least bound them. -XX: For the cases with variable `malloc` sizes, like `test_const_branch`, GVN identifies same-value constant accesses, but we need a special case for same-value expression accesses. +XX: +For the cases with variable `malloc` sizes, like `test_const_branch`, GVN +identifies same-value constant accesses, but we need a special case for +same-value expression accesses. - + ### Solution -```java -import cpp -import semmle.code.cpp.dataflow.DataFlow -import semmle.code.cpp.valuenumbering.GlobalValueNumbering - -from - AllocationExpr buffer, ArrayExpr access, - // --- - // Expr bufferSizeExpr - // int accessOffset, Expr accessBase, Expr bufferBase, int bufferOffset, Variable bufInit, - // +++ - Expr allocSizeExpr, Expr accessIdx, GVN gvnAccessIdx, GVN gvnAllocSizeExpr, int accessOffset -where - // malloc (100) - // ^^^^^^^^^^^^ AllocationExpr buffer - // buf[...] - // ^^^ ArrayExpr access - // buf[...] - // ^^^ accessIdx - accessIdx = access.getArrayOffset() and - // Find allocation size expression flowing to the allocation. - DataFlow::localExprFlow(allocSizeExpr, buffer.getSizeExpr()) and - // Ensure buffer access refers to the matching allocation - DataFlow::localExprFlow(buffer, access.getArrayBase()) and - // Use GVN - globalValueNumber(accessIdx) = gvnAccessIdx and - globalValueNumber(allocSizeExpr) = gvnAllocSizeExpr and - ( - // buf[size] or buf[100] - gvnAccessIdx = gvnAllocSizeExpr and - accessOffset = 0 - or - // buf[sz * x * y + 1]; - exists(AddExpr add | - accessIdx = add and - accessOffset >= 0 and - accessOffset = add.getRightOperand().(Literal).getValue().toInt() and - globalValueNumber(add.getLeftOperand()) = gvnAllocSizeExpr - ) - ) -select access, gvnAllocSizeExpr, allocSizeExpr, buffer.getSizeExpr() as allocArg, gvnAccessIdx, - accessIdx, accessOffset -``` - - - + import cpp + import semmle.code.cpp.dataflow.DataFlow + import semmle.code.cpp.valuenumbering.GlobalValueNumbering + + from + AllocationExpr buffer, ArrayExpr access, + // --- + // Expr bufferSizeExpr + // int accessOffset, Expr accessBase, Expr bufferBase, int bufferOffset, Variable bufInit, + // +++ + Expr allocSizeExpr, Expr accessIdx, GVN gvnAccessIdx, GVN gvnAllocSizeExpr, int accessOffset + where + // malloc (100) + // ^^^^^^^^^^^^ AllocationExpr buffer + // buf[...] + // ^^^ ArrayExpr access + // buf[...] + // ^^^ accessIdx + accessIdx = access.getArrayOffset() and + // Find allocation size expression flowing to the allocation. + DataFlow::localExprFlow(allocSizeExpr, buffer.getSizeExpr()) and + // Ensure buffer access refers to the matching allocation + DataFlow::localExprFlow(buffer, access.getArrayBase()) and + // Use GVN + globalValueNumber(accessIdx) = gvnAccessIdx and + globalValueNumber(allocSizeExpr) = gvnAllocSizeExpr and + ( + // buf[size] or buf[100] + gvnAccessIdx = gvnAllocSizeExpr and + accessOffset = 0 + or + // buf[sz * x * y + 1]; + exists(AddExpr add | + accessIdx = add and + accessOffset >= 0 and + accessOffset = add.getRightOperand().(Literal).getValue().toInt() and + globalValueNumber(add.getLeftOperand()) = gvnAllocSizeExpr + ) + ) + select access, gvnAllocSizeExpr, allocSizeExpr, buffer.getSizeExpr() as allocArg, gvnAccessIdx, + accessIdx, accessOffset + + + ### First 5 results Results note: -- The allocation size of 200 is never used in an access, so the GVN match eliminates it from the result list. +- The allocation size of 200 is never used in an access, so the GVN match + eliminates it from the result list. - | test.c:21:5:21:13 | access to array | test.c:15:26:15:28 | GVN | test.c:15:26:15:28 | 100 | test.c:16:24:16:27 | size | test.c:15:26:15:28 | GVN | test.c:21:9:21:12 | size | 0 | - | test.c:21:5:21:13 | access to array | test.c:15:26:15:28 | GVN | test.c:16:24:16:27 | size | test.c:16:24:16:27 | size | test.c:15:26:15:28 | GVN | test.c:21:9:21:12 | size | 0 | - | test.c:38:5:38:12 | access to array | test.c:26:39:26:41 | GVN | test.c:26:39:26:41 | 100 | test.c:28:24:28:27 | size | test.c:26:39:26:41 | GVN | test.c:38:9:38:11 | 100 | 0 | - | test.c:69:5:69:19 | access to array | test.c:63:24:63:33 | GVN | test.c:63:24:63:33 | allocsize | test.c:63:24:63:33 | allocsize | test.c:63:24:63:33 | GVN | test.c:69:9:69:18 | allocsize | 0 | - | test.c:73:9:73:23 | access to array | test.c:63:24:63:33 | GVN | test.c:63:24:63:33 | allocsize | test.c:63:24:63:33 | allocsize | test.c:63:24:63:33 | GVN | test.c:73:13:73:22 | allocsize | 0 | - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
test.c:21:5:21:13access to arraytest.c:15:26:15:28GVNtest.c:15:26:15:28100test.c:16:24:16:27sizetest.c:15:26:15:28GVNtest.c:21:9:21:12size0
test.c:21:5:21:13access to arraytest.c:15:26:15:28GVNtest.c:16:24:16:27sizetest.c:16:24:16:27sizetest.c:15:26:15:28GVNtest.c:21:9:21:12size0
test.c:38:5:38:12access to arraytest.c:26:39:26:41GVNtest.c:26:39:26:41100test.c:28:24:28:27sizetest.c:26:39:26:41GVNtest.c:38:9:38:111000
test.c:69:5:69:19access to arraytest.c:63:24:63:33GVNtest.c:63:24:63:33allocsizetest.c:63:24:63:33allocsizetest.c:63:24:63:33GVNtest.c:69:9:69:18allocsize0
test.c:73:9:73:23access to arraytest.c:63:24:63:33GVNtest.c:63:24:63:33allocsizetest.c:63:24:63:33allocsizetest.c:63:24:63:33GVNtest.c:73:13:73:22allocsize0
+ + + ## TODO hashconsing @@ -1354,4 +2248,6 @@ hashcons: every value gets a number based on structure. Fails on sz = 100; buf[sz * x * y - 1]; // COMPLIANT -The final exercise is to implement the `isOffsetOutOfBoundsGVN` predicate to […] +The final exercise is to implement the `isOffsetOutOfBoundsGVN` +predicate to […] + From e4e2995a08106fb6e2bd395ae8e58ebf5bd63423 Mon Sep 17 00:00:00 2001 From: Michael Hohn Date: Tue, 23 May 2023 12:38:38 -0700 Subject: [PATCH 26/28] Apply the solution/5 results format to all steps --- session/session.org | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/session/session.org b/session/session.org index 19faf0c..983d731 100644 --- a/session/session.org +++ b/session/session.org @@ -191,9 +191,10 @@ Standards repository]]. constant expression. *** Solution - #+INCLUDE: "example1.ql" src java + #+INCLUDE: "example1.ql" src java -This produces 12 results, with some cross-function pairs. +*** First 5 results + #+INCLUDE: "../session-tests/Example1/example1.expected" :lines "-6"’ ** Step 2 The previous query fails to connect the =malloc= calls with the array accesses, @@ -214,11 +215,10 @@ To address these, take the query from the previous exercise and =Expr.getArrayBase()= predicate. *** Solution - #+INCLUDE: "example2.ql" src java -*** Results - There are now 3 results. These are from only one function, the one using constants. +*** First 5 results + #+INCLUDE: "../session-tests/Example2/example2.expected" :lines "-6"’ ** Step 3 :PROPERTIES: @@ -241,11 +241,10 @@ To address these, take the query from the previous exercise and We include this result by removing the size-retrieval from the prior query. *** Solution - #+INCLUDE: "example3.ql" src java -*** Results - Now, we get 12 results, including some from other test cases. +*** First 5 results + #+INCLUDE: "../session-tests/Example3/example3.expected" :lines "-6"’ ** Step 4 We are looking for out-of-bounds accesses, so we to need to include the @@ -261,11 +260,10 @@ To address these, take the query from the previous exercise and *** Hint *** Solution - #+INCLUDE: "example4.ql" src java -*** Results - Now, we get 15 results, limited to statically determined values. +*** First 5 results + #+INCLUDE: "../session-tests/Example4/example4.expected" :lines "-6"’ ** Step 4a -- some clean-up using predicates @@ -290,9 +288,11 @@ To address these, take the query from the previous exercise and =getValue().toInt()= as one possibility (one predicate). *** Solution - #+INCLUDE: "example4a.ql" src java +*** First 5 results + #+INCLUDE: "../session-tests/Example4a/example4a.expected" :lines "-6"’ + ** Step 5 -- SimpleRangeAnalysis Running the query from Step 2 against the database yields a significant number of missing or incorrect results. The reason is that From 1c733ef7b8306378b44a2cbc4fe2f99af63b9c52 Mon Sep 17 00:00:00 2001 From: Michael Hohn Date: Tue, 23 May 2023 13:55:10 -0700 Subject: [PATCH 27/28] Introduce hashconsing in step 9a to get correct equality comparison --- session-tests/Example9a/Example9a.expected | 9 + session-tests/Example9a/Example9a.qlref | 1 + session-tests/Example9a/test.c | 85 ++ session/Example9a.ql | 64 ++ session/session.md | 864 ++++++++++++++++++--- session/session.org | 62 +- 6 files changed, 965 insertions(+), 120 deletions(-) create mode 100644 session-tests/Example9a/Example9a.expected create mode 100644 session-tests/Example9a/Example9a.qlref create mode 100644 session-tests/Example9a/test.c create mode 100644 session/Example9a.ql diff --git a/session-tests/Example9a/Example9a.expected b/session-tests/Example9a/Example9a.expected new file mode 100644 index 0000000..31e9a5f --- /dev/null +++ b/session-tests/Example9a/Example9a.expected @@ -0,0 +1,9 @@ +| test.c:21:5:21:13 | access to array | test.c:15:26:15:28 | GVN | test.c:15:26:15:28 | 100 | test.c:16:24:16:27 | size | test.c:15:26:15:28 | GVN | test.c:21:9:21:12 | size | test.c:21:9:21:12 | size | 0 | +| test.c:21:5:21:13 | access to array | test.c:15:26:15:28 | GVN | test.c:16:24:16:27 | size | test.c:16:24:16:27 | size | test.c:15:26:15:28 | GVN | test.c:21:9:21:12 | size | test.c:21:9:21:12 | size | 0 | +| test.c:38:5:38:12 | access to array | test.c:26:39:26:41 | GVN | test.c:26:39:26:41 | 100 | test.c:28:24:28:27 | size | test.c:26:39:26:41 | GVN | test.c:38:9:38:11 | 100 | test.c:38:9:38:11 | 100 | 0 | +| test.c:69:5:69:19 | access to array | test.c:63:24:63:33 | GVN | test.c:63:24:63:33 | alloc_size | test.c:63:24:63:33 | alloc_size | test.c:63:24:63:33 | GVN | test.c:69:9:69:18 | alloc_size | test.c:69:9:69:18 | alloc_size | 0 | +| test.c:73:9:73:23 | access to array | test.c:63:24:63:33 | GVN | test.c:63:24:63:33 | alloc_size | test.c:63:24:63:33 | alloc_size | test.c:63:24:63:33 | GVN | test.c:73:13:73:22 | alloc_size | test.c:73:13:73:22 | alloc_size | 0 | +| test.c:74:9:74:27 | access to array | test.c:63:24:63:33 | GVN | test.c:63:24:63:33 | alloc_size | test.c:63:24:63:33 | alloc_size | test.c:74:13:74:26 | GVN | test.c:74:13:74:26 | ... + ... | test.c:74:13:74:22 | alloc_size | 1 | +| test.c:75:9:75:27 | access to array | test.c:63:24:63:33 | GVN | test.c:63:24:63:33 | alloc_size | test.c:63:24:63:33 | alloc_size | test.c:75:13:75:26 | GVN | test.c:75:13:75:26 | ... + ... | test.c:75:13:75:22 | alloc_size | 2 | +| test.c:83:5:83:19 | access to array | test.c:81:24:81:33 | GVN | test.c:81:24:81:33 | ... * ... | test.c:81:24:81:33 | ... * ... | test.c:81:24:81:33 | GVN | test.c:83:9:83:18 | ... * ... | test.c:83:9:83:18 | ... * ... | 0 | +| test.c:84:5:84:23 | access to array | test.c:81:24:81:33 | GVN | test.c:81:24:81:33 | ... * ... | test.c:81:24:81:33 | ... * ... | test.c:84:9:84:22 | GVN | test.c:84:9:84:22 | ... + ... | test.c:84:9:84:18 | ... * ... | 1 | diff --git a/session-tests/Example9a/Example9a.qlref b/session-tests/Example9a/Example9a.qlref new file mode 100644 index 0000000..b2b29f8 --- /dev/null +++ b/session-tests/Example9a/Example9a.qlref @@ -0,0 +1 @@ +Example9a.ql diff --git a/session-tests/Example9a/test.c b/session-tests/Example9a/test.c new file mode 100644 index 0000000..6eaa405 --- /dev/null +++ b/session-tests/Example9a/test.c @@ -0,0 +1,85 @@ +void *malloc(unsigned long);/* clang compatible */ + +unsigned long extern_get_size(void); + +void test_const(void) +{ + char *buf = malloc(100); + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[100]; // NON_COMPLIANT +} + +void test_const_var(void) +{ + unsigned long size = 100; + char *buf = malloc(size); + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[size - 1]; // COMPLIANT + buf[100]; // NON_COMPLIANT + buf[size]; // NON_COMPLIANT +} + +void test_const_branch(int mode, int random_condition) +{ + unsigned long size = (mode == 1 ? 100 : 200); + + char *buf = malloc(size); + + if (random_condition) + { + size = 300; + } + + buf[0]; // COMPLIANT + buf[99]; // COMPLIANT + buf[size - 1]; // NON_COMPLIANT + buf[100]; // NON_COMPLIANT[DONT REPORT] + buf[size]; // NON_COMPLIANT + + if (size < 199) + { + buf[size]; // COMPLIANT + buf[size + 1]; // COMPLIANT + buf[size + 2]; // NON_COMPLIANT + } +} + +void test_const_branch2(int mode) +{ + unsigned long alloc_size = 0; + + if (mode == 1) + { + alloc_size = 200; + } + else + { + // unknown const size - don't report accesses + alloc_size = extern_get_size(); + } + + char *buf = malloc(alloc_size); + + buf[0]; // COMPLIANT + buf[100]; // COMPLIANT + buf[200]; // NON_COMPLIANT + buf[alloc_size - 1]; // COMPLIANT + buf[alloc_size]; // NON_COMPLIANT + + if (alloc_size < 199) + { + buf[alloc_size]; // COMPLIANT + buf[alloc_size + 1]; // COMPLIANT + buf[alloc_size + 2]; // NON_COMPLIANT + } +} + +void test_gvn_var(unsigned long x, unsigned long y, unsigned long sz) +{ + char *buf = malloc(sz * x * y); + buf[sz * x * y - 1]; // COMPLIANT + buf[sz * x * y]; // NON_COMPLIANT + buf[sz * x * y + 1]; // NON_COMPLIANT +} diff --git a/session/Example9a.ql b/session/Example9a.ql new file mode 100644 index 0000000..98e5f3d --- /dev/null +++ b/session/Example9a.ql @@ -0,0 +1,64 @@ +import cpp +import semmle.code.cpp.dataflow.DataFlow +import semmle.code.cpp.valuenumbering.GlobalValueNumbering +import semmle.code.cpp.valuenumbering.HashCons + +from + AllocationExpr buffer, ArrayExpr access, Expr allocSizeExpr, Expr accessIdx, GVN gvnAccessIdx, + GVN gvnAllocSizeExpr, int accessOffset, + // +++ + Expr allocArg, Expr accessBase +where + // malloc (100) + // ^^^^^^^^^^^^ AllocationExpr buffer + // buf[...] + // ^^^ ArrayExpr access + // buf[...] + // ^^^ accessIdx + accessIdx = access.getArrayOffset() and + // Find allocation size expression flowing to the allocation. + DataFlow::localExprFlow(allocSizeExpr, buffer.getSizeExpr()) and + // Ensure buffer access refers to the matching allocation + DataFlow::localExprFlow(buffer, access.getArrayBase()) and + // Use GVN + globalValueNumber(accessIdx) = gvnAccessIdx and + globalValueNumber(allocSizeExpr) = gvnAllocSizeExpr and + ( + // buf[size] or buf[100] + gvnAccessIdx = gvnAllocSizeExpr and + accessOffset = 0 and + // +++ + accessBase = accessIdx + or + // buf[sz * x * y + 1]; + exists(AddExpr add | + accessIdx = add and + accessOffset >= 0 and + accessOffset = add.getRightOperand().(Literal).getValue().toInt() and + globalValueNumber(add.getLeftOperand()) = gvnAllocSizeExpr and + // +++ + accessBase = add.getLeftOperand() + ) + ) and + buffer.getSizeExpr() = allocArg and + ( + accessOffset >= 0 and + // +++ + // Illustrating the subtle meanings of equality: + // 0 results: + // (accessBase = allocSizeExpr or accessBase = allocArg) + // Only 6 results: + // ( + // gvnAccessIdx = gvnAllocSizeExpr or + // gvnAccessIdx = globalValueNumber(allocArg) + // ) + // 9 results: + ( + hashCons(accessBase) = hashCons(allocSizeExpr) or + hashCons(accessBase) = hashCons(allocArg) + ) + ) +// gvnAccessIdx = globalValueNumber(allocArg)) +// +++ overview select: +select access, gvnAllocSizeExpr, allocSizeExpr, allocArg, gvnAccessIdx, accessIdx, accessBase, + accessOffset diff --git a/session/session.md b/session/session.md index c8c2a0f..f5723fb 100644 --- a/session/session.md +++ b/session/session.md @@ -9,46 +9,50 @@ 6. [Session/Workshop notes](#sessionworkshop-notes) 1. [Step 1](#exercise-1) 1. [Hints](#hints) - 2. [Solution](#org14d20ad) - 2. [Step 2](#org6996134) + 2. [Solution](#orgfe07e83) + 3. [First 5 results](#org9bdf3d9) + 2. [Step 2](#org97296bf) 1. [Hints](#hints) - 2. [Solution](#orge54f273) - 3. [Results](#org7721736) + 2. [Solution](#orgd06b765) + 3. [First 5 results](#org4ffae11) 3. [Step 3](#exercise-2) - 1. [Solution](#org77a77b4) - 2. [Results](#org14b2eb8) - 4. [Step 4](#org70ec45b) - 1. [Hint](#org952151f) - 2. [Solution](#org443dc33) - 3. [Results](#org9eba298) - 5. [Step 4a – some clean-up using predicates](#orga1b1648) - 1. [Solution](#orgf6ab8fd) - 6. [Step 5 – SimpleRangeAnalysis](#orga0ae19d) - 1. [Solution](#org38203d6) - 2. [First 5 results](#org8d0b049) - 7. [Step 6](#org2e181e8) - 1. [Solution](#org7ff86a4) - 2. [First 5 results](#org35eb492) - 8. [Step 7](#orgbaba437) - 1. [Solution](#org2558217) - 2. [First 5 results](#org319c753) - 9. [Step 7a](#org5c3cbb9) - 1. [Solution](#org631c47f) - 2. [First 5 results](#orgdcbb8ea) - 10. [Step 7b](#org9b279f6) - 1. [Solution](#org54470ce) - 2. [First 5 results](#orga2d47ca) - 11. [Step 8](#orgbe1a4ba) - 1. [Solution](#org966c6c5) - 2. [First 5 results](#org9c29a8e) - 12. [Interim notes](#org39ee1c0) - 13. [Step 8a](#org477a7f7) - 1. [Solution](#orgf806ffb) - 2. [First 5 results](#org18e8bda) - 14. [Step 9 – Global Value Numbering](#org5b5e629) - 1. [Solution](#orgc717ad3) - 2. [First 5 results](#orgf97dbbc) - 15. [hashconsing](#orgc221436) + 1. [Solution](#org397729b) + 2. [First 5 results](#org9284977) + 4. [Step 4](#orgd659e86) + 1. [Hint](#org96b6cb3) + 2. [Solution](#org5fd27f0) + 3. [First 5 results](#org56c584d) + 5. [Step 4a – some clean-up using predicates](#org20718dc) + 1. [Solution](#orgaeb3205) + 2. [First 5 results](#org495cd47) + 6. [Step 5 – SimpleRangeAnalysis](#orgc3291f5) + 1. [Solution](#org8dfe690) + 2. [First 5 results](#orgf8f1a57) + 7. [Step 6](#orgeb37f69) + 1. [Solution](#org4018377) + 2. [First 5 results](#org009c83c) + 8. [Step 7](#orgf560236) + 1. [Solution](#org5c57a12) + 2. [First 5 results](#orgd6ee067) + 9. [Step 7a](#orge525014) + 1. [Solution](#org3c1aead) + 2. [First 5 results](#org0cc7e07) + 10. [Step 7b](#org2e90c28) + 1. [Solution](#org258a9eb) + 2. [First 5 results](#org53850c2) + 11. [Step 8](#org3478261) + 1. [Solution](#org3f5fce2) + 2. [First 5 results](#org0473a24) + 12. [Interim notes](#org1665440) + 13. [Step 8a](#org7fde7a9) + 1. [Solution](#org5b44900) + 2. [First 5 results](#orgec9f223) + 14. [Step 9 – Global Value Numbering](#org5178301) + 1. [Solution](#org9889f6d) + 2. [First 5 results](#org6d67c9e) + 15. [Step 9a – hashconsing](#org601da34) + 1. [Solution](#orgb08a8c8) + 2. [First 5 results](#org4566318) @@ -207,12 +211,12 @@ To find these issues, 3. We further extend these queries with rudimentary arithmetic support involving expressions common to the allocation and the array access. 4. For cases where constant expressions are not available or are uncertain, we - first try [range analysis](#orga0ae19d) to expand the query's applicability. + first try [range analysis](#orgc3291f5) to expand the query's applicability. 5. For cases where this is insufficient, we introduce global value numbering - [GVN](https://codeql.github.com/docs/codeql-language-guides/hash-consing-and-value-numbering) in [Step 9 – Global Value Numbering](#org5b5e629), to detect values known to be equal + [GVN](https://codeql.github.com/docs/codeql-language-guides/hash-consing-and-value-numbering) in [Step 9 – Global Value Numbering](#org5178301), to detect values known to be equal at runtime. 6. When *those* cases are insufficient, we handle the case of identical - structure using [hashconsing](#orgc221436). + structure using [BROKEN LINK: \*hashconsing]. @@ -247,7 +251,7 @@ in [db.c](file:///Users/hohn/local/codeql-workshop-runtime-values-c/session-db/D constant expression. - + ### Solution @@ -274,10 +278,109 @@ in [db.c](file:///Users/hohn/local/codeql-workshop-runtime-values-c/session-db/D bufferSize = allocSizeExpr.getValue().toInt() select buffer, access, accessIdx, access.getArrayOffset(), bufferSize, allocSizeExpr -This produces 12 results, with some cross-function pairs. + - +### First 5 results + + + + +++ ++ ++ ++ ++ ++ ++ ++ ++ ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
test.c:7:17:7:22call to malloctest.c:8:5:8:10access to array0test.c:8:9:8:90100test.c:7:24:7:26100
test.c:7:17:7:22call to malloctest.c:9:5:9:11access to array99test.c:9:9:9:1099100test.c:7:24:7:26100
test.c:7:17:7:22call to malloctest.c:10:5:10:12access to array100test.c:10:9:10:11100100test.c:7:24:7:26100
test.c:7:17:7:22call to malloctest.c:17:5:17:10access to array0test.c:17:9:17:90100test.c:7:24:7:26100
test.c:7:17:7:22call to malloctest.c:18:5:18:11access to array99test.c:18:9:18:1099100test.c:7:24:7:26100
+ + + ## Step 2 @@ -301,7 +404,7 @@ To address these, take the query from the previous exercise and `Expr.getArrayBase()` predicate. - + ### Solution @@ -341,11 +444,77 @@ To address these, take the query from the previous exercise and select buffer, access, accessIdx, access.getArrayOffset(), bufferSize, allocSizeExpr - + + +### First 5 results + + + + +++ +-### Results +-There are now 3 results. These are from only one function, the one using constants. ++ ++ ++ ++ ++ ++ ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
test.c:7:17:7:22call to malloctest.c:8:5:8:10access to array0test.c:8:9:8:90100test.c:7:24:7:26100
test.c:7:17:7:22call to malloctest.c:9:5:9:11access to array99test.c:9:9:9:1099100test.c:7:24:7:26100
test.c:7:17:7:22call to malloctest.c:10:5:10:12access to array100test.c:10:9:10:11100100test.c:7:24:7:26100
@@ -367,7 +536,7 @@ Here, the `malloc` argument is a variable with known value. We include this result by removing the size-retrieval from the prior query. - + ### Solution @@ -406,14 +575,87 @@ We include this result by removing the size-retrieval from the prior query. select buffer, access, accessIdx, access.getArrayOffset() - + + +### First 5 results + + + + +++ +-### Results ++ ++ ++ ++ ++ + + + + + + + + + + -Now, we get 12 results, including some from other test cases. + + + + + + + + + - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
test.c:7:17:7:22call to malloctest.c:8:5:8:10access to array0test.c:8:9:8:90
test.c:7:17:7:22call to malloctest.c:9:5:9:11access to array99test.c:9:9:9:1099
test.c:7:17:7:22call to malloctest.c:10:5:10:12access to array100test.c:10:9:10:11100
test.c:16:17:16:22call to malloctest.c:17:5:17:10access to array0test.c:17:9:17:90
test.c:16:17:16:22call to malloctest.c:18:5:18:11access to array99test.c:18:9:18:1099
+ + + ## Step 4 @@ -429,12 +671,12 @@ access rather than a constant. The next goal is We have an expression `size` that flows into the `malloc()` call. - + ### Hint - + ### Solution @@ -480,14 +722,108 @@ We have an expression `size` that flows into the `malloc()` call. select buffer, access, accessIdx, access.getArrayOffset(), bufferSize, bse - + + +### First 5 results + + + -### Results ++-Now, we get 15 results, limited to statically determined values. ++ ++ ++ ++ ++ ++ ++ ++ ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
test.c:7:17:7:22call to malloctest.c:8:5:8:10access to array0test.c:8:9:8:90100test.c:7:24:7:26100
test.c:7:17:7:22call to malloctest.c:9:5:9:11access to array99test.c:9:9:9:1099100test.c:7:24:7:26100
test.c:7:17:7:22call to malloctest.c:10:5:10:12access to array100test.c:10:9:10:11100100test.c:7:24:7:26100
test.c:16:17:16:22call to malloctest.c:17:5:17:10access to array0test.c:17:9:17:90100test.c:15:26:15:28100
test.c:16:17:16:22call to malloctest.c:18:5:18:11access to array99test.c:18:9:18:1099100test.c:15:26:15:28100
- + ## Step 4a – some clean-up using predicates @@ -513,7 +849,7 @@ Also, simplify the `from...where...select`: `getValue().toInt()` as one possibility (one predicate). - + ### Solution @@ -564,7 +900,108 @@ Also, simplify the `from...where...select`: } - + + +### First 5 results + + + + +++ ++ ++ ++ ++ ++ ++ ++ ++ ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
test.c:7:17:7:22call to malloctest.c:8:5:8:10access to array0test.c:8:9:8:90100test.c:7:24:7:26100
test.c:7:17:7:22call to malloctest.c:9:5:9:11access to array99test.c:9:9:9:1099100test.c:7:24:7:26100
test.c:7:17:7:22call to malloctest.c:10:5:10:12access to array100test.c:10:9:10:11100100test.c:7:24:7:26100
test.c:16:17:16:22call to malloctest.c:17:5:17:10access to array0test.c:17:9:17:90100test.c:15:26:15:28100
test.c:16:17:16:22call to malloctest.c:18:5:18:11access to array99test.c:18:9:18:1099100test.c:15:26:15:28100
+ + + ## Step 5 – SimpleRangeAnalysis @@ -605,7 +1042,7 @@ Notes: select bufferSizeExpr, buffer, access, accessIdx, upperBound(accessIdx) as accessMax - + ### Solution @@ -661,7 +1098,7 @@ Notes: } - + ### First 5 results @@ -755,7 +1192,7 @@ Notes: - + ## Step 6 @@ -786,7 +1223,7 @@ Hints: `double`? - + ### Solution @@ -844,7 +1281,7 @@ Hints: } - + ### First 5 results @@ -973,7 +1410,7 @@ Hints: - + ## Step 7 @@ -986,7 +1423,7 @@ Hints: 3. Compare them - + ### Solution @@ -1041,7 +1478,7 @@ Hints: } - + ### First 5 results @@ -1170,7 +1607,7 @@ Hints: - + ## Step 7a @@ -1178,7 +1615,7 @@ Hints: 2. Put all expressions into the select for review. - + ### Solution @@ -1230,7 +1667,7 @@ Hints: } - + ### First 5 results @@ -1373,7 +1810,7 @@ Hints: - + ## Step 7b @@ -1382,7 +1819,7 @@ Hints: 3. Report only the questionable entries. - + ### Solution @@ -1488,7 +1925,7 @@ Hints: } - + ### First 5 results @@ -1535,7 +1972,7 @@ WARNING: Unused predicate computeIndices (/Users/hohn/local/codeql-workshop-runt - + ## Step 8 @@ -1586,7 +2023,7 @@ Note: - We will address this in the next section. - + ### Solution @@ -1651,7 +2088,7 @@ Note: } - + ### First 5 results @@ -1780,7 +2217,7 @@ Note: - + ## Interim notes @@ -1799,7 +2236,7 @@ constants that flow to a given expression. Another approach is global value numbering, used next. - + ## Step 8a @@ -1814,7 +2251,7 @@ Note: These are addressed in the next step. - + ### Solution @@ -1883,7 +2320,7 @@ These are addressed in the next step. } - + ### First 5 results @@ -2012,7 +2449,7 @@ These are addressed in the next step. - + ## Step 9 – Global Value Numbering @@ -2029,9 +2466,6 @@ This is the case in the last test case, buf[sz * x * y + 1]; // NON_COMPLIANT } -Reference: - - Global value numbering only knows that runtime values are equal; they are not comparable (`<, >, <=` etc.), and the *actual* value is not known. @@ -2056,13 +2490,8 @@ for expressions like we have to "evaluate" the expressions – or at least bound them. -XX: -For the cases with variable `malloc` sizes, like `test_const_branch`, GVN -identifies same-value constant accesses, but we need a special case for -same-value expression accesses. - - + ### Solution @@ -2109,7 +2538,7 @@ same-value expression accesses. accessIdx, accessOffset - + ### First 5 results @@ -2236,18 +2665,249 @@ Results note: - + -## TODO hashconsing +## Step 9a – hashconsing -import semmle.code.cpp.valuenumbering.HashCons +For the cases with variable `malloc` sizes, like `test_const_branch`, GVN +identifies same-value constant accesses, but we need a special case for +same-structure expression accesses. Enter `hashCons`. -hashcons: every value gets a number based on structure. Fails on +From the reference: + - char *buf = malloc(sz * x * y); - sz = 100; - buf[sz * x * y - 1]; // COMPLIANT +> The hash consing library (defined in semmle.code.cpp.valuenumbering.HashCons) +> provides a mechanism for identifying expressions that have the same syntactic +> structure. + +Additions to the imports, and use: + + import semmle.code.cpp.valuenumbering.HashCons + ... + hashCons(expr) + +This step illustrates some subtle meanings of equality. In particular, there +is plain `=`, GVN, and `hashCons`: + + // 0 results: + // (accessBase = allocSizeExpr or accessBase = allocArg) + + // Only 6 results: + + // ( + // gvnAccessIdx = gvnAllocSizeExpr or + // gvnAccessIdx = globalValueNumber(allocArg) + // ) + + // 9 results: + ( + hashCons(accessBase) = hashCons(allocSizeExpr) or + hashCons(accessBase) = hashCons(allocArg) + ) + + + + +### Solution -The final exercise is to implement the `isOffsetOutOfBoundsGVN` -predicate to […] + import cpp + import semmle.code.cpp.dataflow.DataFlow + import semmle.code.cpp.valuenumbering.GlobalValueNumbering + import semmle.code.cpp.valuenumbering.HashCons + + from + AllocationExpr buffer, ArrayExpr access, Expr allocSizeExpr, Expr accessIdx, GVN gvnAccessIdx, + GVN gvnAllocSizeExpr, int accessOffset, + // +++ + Expr allocArg, Expr accessBase + where + // malloc (100) + // ^^^^^^^^^^^^ AllocationExpr buffer + // buf[...] + // ^^^ ArrayExpr access + // buf[...] + // ^^^ accessIdx + accessIdx = access.getArrayOffset() and + // Find allocation size expression flowing to the allocation. + DataFlow::localExprFlow(allocSizeExpr, buffer.getSizeExpr()) and + // Ensure buffer access refers to the matching allocation + DataFlow::localExprFlow(buffer, access.getArrayBase()) and + // Use GVN + globalValueNumber(accessIdx) = gvnAccessIdx and + globalValueNumber(allocSizeExpr) = gvnAllocSizeExpr and + ( + // buf[size] or buf[100] + gvnAccessIdx = gvnAllocSizeExpr and + accessOffset = 0 and + // +++ + accessBase = accessIdx + or + // buf[sz * x * y + 1]; + exists(AddExpr add | + accessIdx = add and + accessOffset >= 0 and + accessOffset = add.getRightOperand().(Literal).getValue().toInt() and + globalValueNumber(add.getLeftOperand()) = gvnAllocSizeExpr and + // +++ + accessBase = add.getLeftOperand() + ) + ) and + buffer.getSizeExpr() = allocArg and + ( + accessOffset >= 0 and + // +++ + // Illustrating the subtle meanings of equality: + // 0 results: + // (accessBase = allocSizeExpr or accessBase = allocArg) + // Only 6 results: + // ( + // gvnAccessIdx = gvnAllocSizeExpr or + // gvnAccessIdx = globalValueNumber(allocArg) + // ) + // 9 results: + ( + hashCons(accessBase) = hashCons(allocSizeExpr) or + hashCons(accessBase) = hashCons(allocArg) + ) + ) + // gvnAccessIdx = globalValueNumber(allocArg)) + // +++ overview select: + select access, gvnAllocSizeExpr, allocSizeExpr, allocArg, gvnAccessIdx, accessIdx, accessBase, + accessOffset + + + + +### First 5 results + + + + +++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
test.c:21:5:21:13access to arraytest.c:15:26:15:28GVNtest.c:15:26:15:28100test.c:16:24:16:27sizetest.c:15:26:15:28GVNtest.c:21:9:21:12sizetest.c:21:9:21:12size0
test.c:21:5:21:13access to arraytest.c:15:26:15:28GVNtest.c:16:24:16:27sizetest.c:16:24:16:27sizetest.c:15:26:15:28GVNtest.c:21:9:21:12sizetest.c:21:9:21:12size0
test.c:38:5:38:12access to arraytest.c:26:39:26:41GVNtest.c:26:39:26:41100test.c:28:24:28:27sizetest.c:26:39:26:41GVNtest.c:38:9:38:11100test.c:38:9:38:111000
test.c:69:5:69:19access to arraytest.c:63:24:63:33GVNtest.c:63:24:63:33allocsizetest.c:63:24:63:33allocsizetest.c:63:24:63:33GVNtest.c:69:9:69:18allocsizetest.c:69:9:69:18allocsize0
test.c:73:9:73:23access to arraytest.c:63:24:63:33GVNtest.c:63:24:63:33allocsizetest.c:63:24:63:33allocsizetest.c:63:24:63:33GVNtest.c:73:13:73:22allocsizetest.c:73:13:73:22allocsize0
diff --git a/session/session.org b/session/session.org index 983d731..8bafd54 100644 --- a/session/session.org +++ b/session/session.org @@ -499,9 +499,6 @@ To address these, take the query from the previous exercise and } #+end_example - Reference: - [[https://codeql.github.com/docs/codeql-language-guides/hash-consing-and-value-numbering/]] - Global value numbering only knows that runtime values are equal; they are not comparable (=<, >, <== etc.), and the /actual/ value is not known. @@ -527,11 +524,6 @@ To address these, take the query from the previous exercise and #+end_example we have to "evaluate" the expressions -- or at least bound them. - XX: - For the cases with variable =malloc= sizes, like =test_const_branch=, GVN - identifies same-value constant accesses, but we need a special case for - same-value expression accesses. - *** Solution #+INCLUDE: "example9.ql" src java @@ -542,16 +534,50 @@ To address these, take the query from the previous exercise and #+INCLUDE: "../session-tests/Example9/example9.expected" :lines "-6"’ -** TODO hashconsing -import semmle.code.cpp.valuenumbering.HashCons +** Step 9a -- hashconsing + For the cases with variable =malloc= sizes, like =test_const_branch=, GVN + identifies same-value constant accesses, but we need a special case for + same-structure expression accesses. Enter =hashCons=. + + From the reference: + [[https://codeql.github.com/docs/codeql-language-guides/hash-consing-and-value-numbering/]] + + #+BEGIN_QUOTE + The hash consing library (defined in semmle.code.cpp.valuenumbering.HashCons) + provides a mechanism for identifying expressions that have the same syntactic + structure. + #+END_QUOTE + + Additions to the imports, and use: + #+BEGIN_SRC java + import semmle.code.cpp.valuenumbering.HashCons + ... + hashCons(expr) + #+END_SRC + + This step illustrates some subtle meanings of equality. In particular, there + is plain ===, GVN, and =hashCons=: + #+BEGIN_SRC java + // 0 results: + // (accessBase = allocSizeExpr or accessBase = allocArg) + + // Only 6 results: -hashcons: every value gets a number based on structure. Fails on -#+begin_example -char *buf = malloc(sz * x * y); -sz = 100; -buf[sz * x * y - 1]; // COMPLIANT -#+end_example + // ( + // gvnAccessIdx = gvnAllocSizeExpr or + // gvnAccessIdx = globalValueNumber(allocArg) + // ) + // 9 results: + ( + hashCons(accessBase) = hashCons(allocSizeExpr) or + hashCons(accessBase) = hashCons(allocArg) + ) -The final exercise is to implement the =isOffsetOutOfBoundsGVN= -predicate to [...] + #+END_SRC + +*** Solution + #+INCLUDE: "Example9a.ql" src java + +*** First 5 results + #+INCLUDE: "../session-tests/Example9a/example9a.expected" :lines "-6"’ From c59d8bf1477bb84eadaafcf9ede081dc5e5ccb4b Mon Sep 17 00:00:00 2001 From: Michael Hohn Date: Tue, 23 May 2023 14:04:46 -0700 Subject: [PATCH 28/28] A short note on the structure of directories and their use --- session/session.md | 189 ++++++++++++++++++++++++-------------------- session/session.org | 51 +++++++----- 2 files changed, 133 insertions(+), 107 deletions(-) diff --git a/session/session.md b/session/session.md index f5723fb..9db6831 100644 --- a/session/session.md +++ b/session/session.md @@ -6,53 +6,54 @@ 3. [Setup Instructions](#setup-instructions) 4. [Introduction](#introduction) 5. [A Note on the Scope of This Workshop](#a-note-on-the-scope-of-this-workshop) -6. [Session/Workshop notes](#sessionworkshop-notes) +6. [A short note on the structure of directories and their use](#org28d73fb) +7. [Session/Workshop notes](#sessionworkshop-notes) 1. [Step 1](#exercise-1) 1. [Hints](#hints) - 2. [Solution](#orgfe07e83) - 3. [First 5 results](#org9bdf3d9) - 2. [Step 2](#org97296bf) + 2. [Solution](#org4777775) + 3. [First 5 results](#org1aa6a22) + 2. [Step 2](#org1e52aa7) 1. [Hints](#hints) - 2. [Solution](#orgd06b765) - 3. [First 5 results](#org4ffae11) + 2. [Solution](#org4ba1960) + 3. [First 5 results](#org61872ef) 3. [Step 3](#exercise-2) - 1. [Solution](#org397729b) - 2. [First 5 results](#org9284977) - 4. [Step 4](#orgd659e86) - 1. [Hint](#org96b6cb3) - 2. [Solution](#org5fd27f0) - 3. [First 5 results](#org56c584d) - 5. [Step 4a – some clean-up using predicates](#org20718dc) - 1. [Solution](#orgaeb3205) - 2. [First 5 results](#org495cd47) - 6. [Step 5 – SimpleRangeAnalysis](#orgc3291f5) - 1. [Solution](#org8dfe690) - 2. [First 5 results](#orgf8f1a57) - 7. [Step 6](#orgeb37f69) - 1. [Solution](#org4018377) - 2. [First 5 results](#org009c83c) - 8. [Step 7](#orgf560236) - 1. [Solution](#org5c57a12) - 2. [First 5 results](#orgd6ee067) - 9. [Step 7a](#orge525014) - 1. [Solution](#org3c1aead) - 2. [First 5 results](#org0cc7e07) - 10. [Step 7b](#org2e90c28) - 1. [Solution](#org258a9eb) - 2. [First 5 results](#org53850c2) - 11. [Step 8](#org3478261) - 1. [Solution](#org3f5fce2) - 2. [First 5 results](#org0473a24) - 12. [Interim notes](#org1665440) - 13. [Step 8a](#org7fde7a9) - 1. [Solution](#org5b44900) - 2. [First 5 results](#orgec9f223) - 14. [Step 9 – Global Value Numbering](#org5178301) - 1. [Solution](#org9889f6d) - 2. [First 5 results](#org6d67c9e) - 15. [Step 9a – hashconsing](#org601da34) - 1. [Solution](#orgb08a8c8) - 2. [First 5 results](#org4566318) + 1. [Solution](#orgffef32c) + 2. [First 5 results](#orga647c8f) + 4. [Step 4](#orgd616664) + 1. [Hint](#orga9ca0e1) + 2. [Solution](#org072f835) + 3. [First 5 results](#orgab7f021) + 5. [Step 4a – some clean-up using predicates](#org74d9df9) + 1. [Solution](#orgd5a3519) + 2. [First 5 results](#orga608103) + 6. [Step 5 – SimpleRangeAnalysis](#org426ad70) + 1. [Solution](#org7c6288c) + 2. [First 5 results](#org338b606) + 7. [Step 6](#orgca8ff14) + 1. [Solution](#orgb24ef12) + 2. [First 5 results](#orgc3c6c20) + 8. [Step 7](#orgeb7c62d) + 1. [Solution](#org8b2cfc4) + 2. [First 5 results](#orgdf5441f) + 9. [Step 7a](#org980fc9e) + 1. [Solution](#org7a58133) + 2. [First 5 results](#org4d2ccdb) + 10. [Step 7b](#orgf204614) + 1. [Solution](#orgb536ad8) + 2. [First 5 results](#org91089f0) + 11. [Step 8](#orgf9da811) + 1. [Solution](#org4d950d1) + 2. [First 5 results](#org012e64b) + 12. [Interim notes](#orgd8277fd) + 13. [Step 8a](#orgdf6dd57) + 1. [Solution](#org2cbb86e) + 2. [First 5 results](#org0c626de) + 14. [Step 9 – Global Value Numbering](#org8474dff) + 1. [Solution](#orga7fc0bc) + 2. [First 5 results](#orgb436331) + 15. [Step 9a – hashconsing](#orgc768b64) + 1. [Solution](#org370d1e6) + 2. [First 5 results](#orgced1d9e) @@ -179,6 +180,20 @@ library](https://github.com/github/codeql-coding-standards/blob/main/c/common/sr Standards repository](https://github.com/github/codeql-coding-standards). + + +# A short note on the structure of directories and their use + +`exercises-tests` are identical to `solution-tests`, the `exercises` directories +are a convenience for developing the queries on your own so you can use the unit +tests as reference. This is for full consistency with the workshop material – +the session – but you may veer off and experiment on your own. + +In that case, a simpler option is to follow the session writeup using a single +`.ql` file; the writeup has full queries and (at most) the first 5 results for +reference. + + # Session/Workshop notes @@ -211,9 +226,9 @@ To find these issues, 3. We further extend these queries with rudimentary arithmetic support involving expressions common to the allocation and the array access. 4. For cases where constant expressions are not available or are uncertain, we - first try [range analysis](#orgc3291f5) to expand the query's applicability. + first try [range analysis](#org426ad70) to expand the query's applicability. 5. For cases where this is insufficient, we introduce global value numbering - [GVN](https://codeql.github.com/docs/codeql-language-guides/hash-consing-and-value-numbering) in [Step 9 – Global Value Numbering](#org5178301), to detect values known to be equal + [GVN](https://codeql.github.com/docs/codeql-language-guides/hash-consing-and-value-numbering) in [Step 9 – Global Value Numbering](#org8474dff), to detect values known to be equal at runtime. 6. When *those* cases are insufficient, we handle the case of identical structure using [BROKEN LINK: \*hashconsing]. @@ -251,7 +266,7 @@ in [db.c](file:///Users/hohn/local/codeql-workshop-runtime-values-c/session-db/D constant expression. - + ### Solution @@ -279,7 +294,7 @@ in [db.c](file:///Users/hohn/local/codeql-workshop-runtime-values-c/session-db/D select buffer, access, accessIdx, access.getArrayOffset(), bufferSize, allocSizeExpr - + ### First 5 results @@ -380,7 +395,7 @@ in [db.c](file:///Users/hohn/local/codeql-workshop-runtime-values-c/session-db/D - + ## Step 2 @@ -404,7 +419,7 @@ To address these, take the query from the previous exercise and `Expr.getArrayBase()` predicate. - + ### Solution @@ -444,7 +459,7 @@ To address these, take the query from the previous exercise and select buffer, access, accessIdx, access.getArrayOffset(), bufferSize, allocSizeExpr - + ### First 5 results @@ -536,7 +551,7 @@ Here, the `malloc` argument is a variable with known value. We include this result by removing the size-retrieval from the prior query. - + ### Solution @@ -575,7 +590,7 @@ We include this result by removing the size-retrieval from the prior query. select buffer, access, accessIdx, access.getArrayOffset() - + ### First 5 results @@ -655,7 +670,7 @@ We include this result by removing the size-retrieval from the prior query. - + ## Step 4 @@ -671,12 +686,12 @@ access rather than a constant. The next goal is We have an expression `size` that flows into the `malloc()` call. - + ### Hint - + ### Solution @@ -722,7 +737,7 @@ We have an expression `size` that flows into the `malloc()` call. select buffer, access, accessIdx, access.getArrayOffset(), bufferSize, bse - + ### First 5 results @@ -823,7 +838,7 @@ We have an expression `size` that flows into the `malloc()` call. - + ## Step 4a – some clean-up using predicates @@ -849,7 +864,7 @@ Also, simplify the `from...where...select`: `getValue().toInt()` as one possibility (one predicate). - + ### Solution @@ -900,7 +915,7 @@ Also, simplify the `from...where...select`: } - + ### First 5 results @@ -1001,7 +1016,7 @@ Also, simplify the `from...where...select`: - + ## Step 5 – SimpleRangeAnalysis @@ -1042,7 +1057,7 @@ Notes: select bufferSizeExpr, buffer, access, accessIdx, upperBound(accessIdx) as accessMax - + ### Solution @@ -1098,7 +1113,7 @@ Notes: } - + ### First 5 results @@ -1192,7 +1207,7 @@ Notes: - + ## Step 6 @@ -1223,7 +1238,7 @@ Hints: `double`? - + ### Solution @@ -1281,7 +1296,7 @@ Hints: } - + ### First 5 results @@ -1410,7 +1425,7 @@ Hints: - + ## Step 7 @@ -1423,7 +1438,7 @@ Hints: 3. Compare them - + ### Solution @@ -1478,7 +1493,7 @@ Hints: } - + ### First 5 results @@ -1607,7 +1622,7 @@ Hints: - + ## Step 7a @@ -1615,7 +1630,7 @@ Hints: 2. Put all expressions into the select for review. - + ### Solution @@ -1667,7 +1682,7 @@ Hints: } - + ### First 5 results @@ -1810,7 +1825,7 @@ Hints: - + ## Step 7b @@ -1819,7 +1834,7 @@ Hints: 3. Report only the questionable entries. - + ### Solution @@ -1925,7 +1940,7 @@ Hints: } - + ### First 5 results @@ -1972,7 +1987,7 @@ WARNING: Unused predicate computeIndices (/Users/hohn/local/codeql-workshop-runt - + ## Step 8 @@ -2023,7 +2038,7 @@ Note: - We will address this in the next section. - + ### Solution @@ -2088,7 +2103,7 @@ Note: } - + ### First 5 results @@ -2217,7 +2232,7 @@ Note: - + ## Interim notes @@ -2236,7 +2251,7 @@ constants that flow to a given expression. Another approach is global value numbering, used next. - + ## Step 8a @@ -2251,7 +2266,7 @@ Note: These are addressed in the next step. - + ### Solution @@ -2320,7 +2335,7 @@ These are addressed in the next step. } - + ### First 5 results @@ -2449,7 +2464,7 @@ These are addressed in the next step. - + ## Step 9 – Global Value Numbering @@ -2491,7 +2506,7 @@ for expressions like we have to "evaluate" the expressions – or at least bound them. - + ### Solution @@ -2538,7 +2553,7 @@ we have to "evaluate" the expressions – or at least bound them. accessIdx, accessOffset - + ### First 5 results @@ -2665,7 +2680,7 @@ Results note: - + ## Step 9a – hashconsing @@ -2706,7 +2721,7 @@ is plain `=`, GVN, and `hashCons`: ) - + ### Solution @@ -2776,7 +2791,7 @@ is plain `=`, GVN, and `hashCons`: accessOffset - + ### First 5 results diff --git a/session/session.org b/session/session.org index 8bafd54..4a94f73 100644 --- a/session/session.org +++ b/session/session.org @@ -106,26 +106,37 @@ results in an access beyond the allocated size of the buffer. :PROPERTIES: :CUSTOM_ID: a-note-on-the-scope-of-this-workshop :END: -This workshop is not intended to be a complete analysis that is useful -for real-world cases of out-of-bounds analyses for reasons including but -not limited to: - -- Missing support for loops and recursion -- No interprocedural analysis -- Missing size calculation of arrays where the element size is not 1 -- No support for pointer arithmetic or in general, operations other than - addition and subtraction -- Overly specific modelling of a buffer access as an array expression - -The goal of this workshop is rather to demonstrate the building blocks -of analyzing run-time values and how to apply those building blocks to -modelling a common class of vulnerability. A more comprehensive and -production-appropriate example is the -[[https://github.com/github/codeql-coding-standards/blob/main/c/common/src/codingstandards/c/OutOfBounds.qll][OutOfBounds.qll -library]] from the -[[https://github.com/github/codeql-coding-standards][CodeQL Coding -Standards repository]]. - + This workshop is not intended to be a complete analysis that is useful + for real-world cases of out-of-bounds analyses for reasons including but + not limited to: + + - Missing support for loops and recursion + - No interprocedural analysis + - Missing size calculation of arrays where the element size is not 1 + - No support for pointer arithmetic or in general, operations other than + addition and subtraction + - Overly specific modelling of a buffer access as an array expression + + The goal of this workshop is rather to demonstrate the building blocks + of analyzing run-time values and how to apply those building blocks to + modelling a common class of vulnerability. A more comprehensive and + production-appropriate example is the + [[https://github.com/github/codeql-coding-standards/blob/main/c/common/src/codingstandards/c/OutOfBounds.qll][OutOfBounds.qll + library]] from the + [[https://github.com/github/codeql-coding-standards][CodeQL Coding + Standards repository]]. + +* A short note on the structure of directories and their use + + =exercises-tests= are identical to =solution-tests=, the =exercises= directories + are a convenience for developing the queries on your own so you can use the unit + tests as reference. This is for full consistency with the workshop material -- + the session -- but you may veer off and experiment on your own. + + In that case, a simpler option is to follow the session writeup using a single + =.ql= file; the writeup has full queries and (at most) the first 5 results for + reference. + * Session/Workshop notes :PROPERTIES: :CUSTOM_ID: sessionworkshop-notes