Skip to content

Commit deade1c

Browse files
committed
Add extractors README with CDS mermaid diagram
Adds the `extractors/README.md` file as an evolution of, and a replace ment for, PR #168 from this `advanced-security/codeql-sap-js` repository. Updates the CDS tools documentation to reflect progress in the multi-stage process of rewriting the CDS extractor to be more maintainable (WIP) and performant (TODO).
1 parent bc89221 commit deade1c

File tree

2 files changed

+89
-6
lines changed

2 files changed

+89
-6
lines changed

extractors/README.md

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
# `advanced-security/codeql-sap-js` : `extractors/README.md`
2+
3+
## CodeQL CDS Extractor : Overview
4+
5+
The CodeQL CDS Extractor is a specialized component designed to process and analyze Core Data Services (CDS) files used in SAP Cloud Application Programming (CAP) model applications. This extractor enables CodeQL's static analysis capabilities to detect security vulnerabilities, bugs, and quality issues in CDS files.
6+
7+
Key capabilities of the extractor include:
8+
- Compiling `.cds` files to an intermediate JSON representation
9+
- Handling SAP CAP dependencies and managing compiler versions
10+
- Integrating with the JavaScript extractor for comprehensive analysis
11+
- Converting CDS code to CodeQL's TRAP format for database inclusion
12+
- Supporting both Windows and Unix-like environments through platform-specific wrapper scripts
13+
14+
The extractor operates as an extension to the JavaScript extractor, complementing its ability to analyze JavaScript, TypeScript, and JSON files with support for the CDS domain-specific language.
15+
16+
## CodeQL CDS Extractor : Flowchart
17+
18+
The following flowchart shows the flow of execution for the current implementation of the extractor.
19+
20+
```mermaid
21+
flowchart TD
22+
COM["`export _build_cmd=<br>$(pwd)/extractors/
23+
javascript/tools/
24+
pre-finalize.sh`"]
25+
DCR[codeql database create<br>--command=$_build_cmd<br>--language=javascript<br>--search-path=./extractors/<br>--<br>/path/to/database]
26+
DB[(CodeQL Database)]
27+
DINIT[codeql database init]
28+
CRE[codeql resolve extractor]
29+
JSE[[javascript extractor]]
30+
DTRAC[codeql database<br>trace-command]
31+
SPF[[pre-finalize.sh]]
32+
DIDX[codeql database index-files<br> --language=cds<br>--include-extension=.cds]
33+
SIF[[index-files.sh]]
34+
SIT[[index-files.ts/js]]
35+
NPM[[npm install & build]]
36+
DETS[[Determine CDS command]]
37+
FIND[[Find package.json dirs]]
38+
INST[[Install dependencies]]
39+
CC[[cds compiler]]
40+
CDJ([.cds.json files])
41+
JSA[[javascript extractor<br>autobuild script]]
42+
TF([CodeQL TRAP files])
43+
DBF[codeql database finalize<br> -- /path/to/database]
44+
45+
COM ==> DCR
46+
DCR ==> |run internal CLI<br>plumbing command| DINIT
47+
DINIT ----> |--language=javascript| CRE
48+
CRE -..-> |/extractor/path/javascript| DINIT
49+
DINIT -.initialize database.-> DB
50+
51+
DINIT ==> |run the<br>javascript extractor| JSE
52+
JSE -.-> |extract javascript files:<br>_.html, .js, .json, .ts_| DB
53+
JSE ==> |run autobuild within<br>the javascript extractor| DTRAC
54+
55+
DTRAC ==> |run the build --command| SPF
56+
SPF ==> |run codeql index-files<br>for CDS files| DIDX
57+
DIDX ==> |invoke script via<br>--search-path| SIF
58+
SIF ==> |runs TypeScript version<br>after npm install| NPM
59+
NPM ==> |executes compiled<br>index-files.js| SIT
60+
61+
SIT ==> |finds project directories<br>with package.json| FIND
62+
FIND ==> |install CDS dependencies<br>in project directories| INST
63+
SIT ==> |determines which<br>cds command to use| DETS
64+
DETS ==> |processes each CDS file| CC
65+
66+
CC ==> |compile .cds files to<br>create .cds.json files| CDJ
67+
CDJ -.-> |stored in same location<br>as original .cds files| DB
68+
69+
SIT ==> |configures extraction<br>filters for JSON files| JSA
70+
JSA ==> |processes .cds.json files<br>via javascript extractor| CDJ
71+
72+
CDJ ==> |javascript extractor<br>generates TRAP files| TF
73+
TF ==> |imported during<br>database finalization| DBF
74+
DBF ==> |finalize database and<br>cleanup temporary files| DB
75+
```

extractors/cds/tools/autobuild.md

Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -14,17 +14,25 @@ This document is meant to be a common reference and a project guide while the it
1414

1515
The current extractor for [CDS] is based on `index-files`, which has several limitations and challenges:
1616

17-
1. **Testability**: The current extractor is difficult to test, and especially difficult to troubleshoot when tests fail, because the current implementation lacks unit tests and relieas heavily on integration tests that are performed in a post-commit workflow that runs via GitHub Actions, which makes it more difficult to track errors back to the source of the problem and adds significant delay to the development process.
17+
1. **Testability**
1818

19-
2. **Performance**: The current extractor is slow and inefficient, especially when dealing with large projects or complex [CDS] files. This is due to the way `index-files` processes files, which can lead to long processing times and increased resource usage. There are several performance improvements that could be made to the extractor, but they are all related to avoid work that we either do not need to do or that has already been done.
19+
The current extractor is difficult to test, and especially difficult to troubleshoot when tests fail, because the current implementation lacks unit tests and relieas heavily on integration tests that are performed in a post-commit workflow that runs via GitHub Actions, which makes it more difficult to track errors back to the source of the problem and adds significant delay to the development process.
20+
21+
2. **Performance**
22+
23+
The current extractor is slow and inefficient, especially when dealing with large projects or complex [CDS] files. This is due to the way `index-files` processes files, which can lead to long processing times and increased resource usage. There are several performance improvements that could be made to the extractor, but they are all related to avoid work that we either do not need to do or that has already been done.
2024

2125
- As one example of a performance problem, using the `index-files` approach means that we are provided with a list of all `.cds` files in the project and are expected to index them all, which makes sense for CodeQL (as we want our database to have a copy of every in-scope source code file) but is horribly inefficient from a [CDS] perspective as the [CDS] format allows for a single file to contain multiple [CDS] definitions. The extractor is expected to be able to handle this by parsing the declarative syntax of the `.cds` file in order to understand which other `.cds` files are to be imported as part of that top-level file, meaning that we are expected to avoid duplicate imports of files that are already (and only) used as library-style imports in top-level (project-level) [CDS] files. This is a non-trivial task, and the current extractor does not even try to parse the contents of the `.cds` files to determine which files are actually used in the project. Instead, it simply imports all `.cds` files that are found in the project, which can lead to duplicate imports and increased processing times.
2226

2327
- Another example of a performance problem is that the current `index-files`-based extractor spends a lot of time installing node dependencies because it runs a `npm install` command in every "CDS project directory" that it finds, which is every directory that contains a `package.json` file and either directly contains a `.cds` file (as a sibling of the `package.json` file) or contains some subdirectory that contains either a `.cds` file or a subdirectory that contains a `.cds` file. This means that the extractor will install these dependencies in a directory that we would rather not make changes in just to be able to use a specific version of `@sap/cds` and/or `@sap/cds-dk` (the dependencies that are needed to run the extractor). This also means that if we have five project that all use the same version of `@sap/cds` and/or `@sap/cds-dk`, we will install that version five separate times in five separate locations, which is both a waste of time and creates a cleanup challenge as the install makes changes to the `package-lock.json` file in each of those five project directories (and also makes changes to the `node_modules` subdirectory of each project directory).
2428

25-
3. **Modularity**: The current extractor is mostly just one giant script, aka [index-files.js](./index-files.js), which is surrounded by a collection of small wrapper scripts (aka [index-files.sh](./index-files.sh) and [index-files.cmd](./index-files.cmd)) that are used to allow the JavaScript code to be run in different environments (i.e. Windows and Unix-like environments). While we cannot really get away from the wrapper scripts. we should refactor the "one giant script" (in a single `index-files.js` file) into a more modular design that allows us to break the extractor into smaller, more manageable pieces.
29+
3. **Modularity**
30+
31+
~~The current extractor is mostly just one giant script, aka [index-files.js](./index-files.js), which is surrounded by a collection of small wrapper scripts (aka [index-files.sh](./index-files.sh) and [index-files.cmd](./index-files.cmd)) that are used to allow the JavaScript code to be run in different environments (i.e. Windows and Unix-like environments). While we cannot really get away from the wrapper scripts. we should refactor the "one giant script" (in a single `index-files.js` file) into a more modular design that allows us to break the extractor into smaller, more manageable pieces.~~
32+
33+
4. **Maintainability**
2634

27-
4. **Maintainability**: The current implementation is lacking in terms of mandating consistent code style and best practices. For example, there are no linting rules applied or any scripts for applying consistent code style. This makes it difficult to maintain the code at a consistent level of quality, where it would be much better to have basic linting applied as a pre-commit task (i.e. to be performed in the developer's IDE). The current implementation also lacks documentation, which makes it difficult for new developers to understand how the extractor works and how to contribute to it.
35+
~~The current implementation is lacking in terms of mandating consistent code style and best practices. For example, there are no linting rules applied or any scripts for applying consistent code style. This makes it difficult to maintain the code at a consistent level of quality, where it would be much better to have basic linting applied as a pre-commit task (i.e. to be performed in the developer's IDE). The current implementation also lacks documentation, which makes it difficult for new developers to understand how the extractor works and how to contribute to it.~~
2836

2937
## Goals for the Future Extractor (using `autobuild`)
3038

@@ -48,9 +56,9 @@ All other goals are secondary to and/or in support of the above two goals.
4856

4957
- The new [autobuild.ts](./autobuild.ts) script will be a kept as minimal as possible, with object-oriented code patterns used to encapsulate the functionality of the extractor in `.ts` files stored in a new `src` directory (project path would be `extractors/cds/tools/src`). This will allow us to break the extractor into smaller, more manageable pieces, and will also make it easier to test and maintain the code over time. The new `src` directory will contain all of the TypeScript code for the extractor, and will be organized into subdirectories based on functionality. For example, we might have a `parsers` subdirectory for parsing code, a `utils` subdirectory for utility functions, and so on. This will allow us to keep the code organized and easy to navigate.
5058

51-
- Use TypeScript as the primary language for the extractor, rather than JavaScript. This will allow us to take advantage of TypeScript's type system and other features that make it easier to write and maintain code. Ultimately, we will still be using JavaScript when running the extractor, but we will use TypeScript to develop the extractor and then compile it to JavaScript for use in the CodeQL extractor. This will allow us to take advantage of TypeScript's type system and other features that make it easier to write, test, and maintain code. This will also allow us to use TypeScript's type system to catch errors at compile time rather than runtime, which will make the extractor more robust and easier to maintain.
59+
- ~~Use TypeScript as the primary language for the extractor, rather than JavaScript. This will allow us to take advantage of TypeScript's type system and other features that make it easier to write and maintain code. Ultimately, we will still be using JavaScript when running the extractor, but we will use TypeScript to develop the extractor and then compile it to JavaScript for use in the CodeQL extractor. This will allow us to take advantage of TypeScript's type system and other features that make it easier to write, test, and maintain code. This will also allow us to use TypeScript's type system to catch errors at compile time rather than runtime, which will make the extractor more robust and easier to maintain.~~
5260

53-
- Add unit tests for everything that can be unit tested. This will allow us to catch errors early in the development process and make it easier to maintain the code over time. We will use a combination of testing frameworks to test the extractor as part of the pre-commit build process. This will allow us to catch errors early in the development process and make it easier to maintain the code over time. Setting up such unit tests will require modifications to the `package.json` file to include the necessary dependencies and scripts for running the tests. We will also need to set up a testing framework, such as Jest or Mocha, to run the tests and report the results. To support all of this, we will create unit tests under a new `test` directory (project path would be `extractors/cds/tools/test`) that will contain all of the unit tests for the extractor. This will allow us to keep the tests organized and easy to navigate. The test directory will be organized into subdirectories based on functionality and mirroring the structure of the `src` directory. For example, if we add a `src/parsers/cdsParser.ts` file, we will also add a `test/parsers/cdsParser.test.ts` file that contains the unit tests for the `cdsParser.ts` file. This will allow us to keep the tests organized and easy to navigate.
61+
- Add unit tests for everything that can be unit tested. This will allow us to catch errors early in the development process and make it easier to maintain the code over time. We will use a combination of testing frameworks to test the extractor as part of the pre-commit build process. This will allow us to catch errors early in the development process and make it easier to maintain the code over time. ~~Setting up such unit tests will require modifications to the `package.json` file to include the necessary dependencies and scripts for running the tests. We will also need to set up a testing framework, such as Jest or Mocha, to run the tests and report the results. To support all of this, we will create unit tests under a new `test` directory (project path would be `extractors/cds/tools/test`) that will contain all of the unit tests for the extractor. This will allow us to keep the tests organized and easy to navigate. The test directory will be organized into subdirectories based on functionality and mirroring the structure of the `src` directory. For example, if we add a `src/parsers/cdsParser.ts` file, we will also add a `test/parsers/cdsParser.test.ts` file that contains the unit tests for the `cdsParser.ts` file.~~ This will allow us to keep the tests organized and easy to navigate.
5462

5563
## Examples of Improved [CDS] Parsing
5664

0 commit comments

Comments
 (0)