Skip to content

Commit 152f2a3

Browse files
committed
Refactor index-files.ts script for modularity
Refactors the index-files.ts script to be follow an object-orient programming design, with the bulk of the script logic abstracted into re-useable functions in the new `extractors/cds/tools/src` directory. Adds a `.prettierrc.js` and updates the `.eslintrc.js` and `package.json` files for the CDS extractor in order to provide better linting coverage / rules for the TypeScript (`.ts`) code for the CDS extractor sub-project.
1 parent cb04d21 commit 152f2a3

17 files changed

+3420
-741
lines changed

extractors/cds/tools/.eslintrc.js

Lines changed: 71 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,23 +2,88 @@ module.exports = {
22
parser: '@typescript-eslint/parser',
33
extends: [
44
'eslint:recommended',
5-
'plugin:@typescript-eslint/recommended'
5+
'plugin:@typescript-eslint/recommended',
6+
'plugin:@typescript-eslint/recommended-requiring-type-checking',
7+
'plugin:import/errors',
8+
'plugin:import/warnings',
9+
'plugin:import/typescript',
10+
'plugin:prettier/recommended'
11+
],
12+
plugins: [
13+
'@typescript-eslint',
14+
'import',
15+
'prettier'
616
],
7-
plugins: ['@typescript-eslint'],
817
env: {
918
node: true,
1019
es2018: true
1120
},
1221
ignorePatterns: [
1322
'index-files.js*',
14-
'node_modules'
23+
'node_modules',
24+
'*.js.map',
25+
'*.d.ts'
1526
],
1627
rules: {
28+
// General rules
1729
'no-console': 'off',
18-
'@typescript-eslint/explicit-module-boundary-types': 'off'
30+
'no-duplicate-imports': 'error',
31+
'no-unused-vars': 'off', // Using TypeScript's version
32+
'no-use-before-define': 'off', // Using TypeScript's version
33+
'no-trailing-spaces': 'error', // Prevent trailing spaces
34+
35+
// TypeScript rules
36+
'@typescript-eslint/explicit-module-boundary-types': 'off',
37+
'@typescript-eslint/no-unused-vars': ['warn', {
38+
'argsIgnorePattern': '^_',
39+
'varsIgnorePattern': '^_'
40+
}],
41+
'@typescript-eslint/no-use-before-define': ['error', {
42+
'functions': false,
43+
'classes': true
44+
}],
45+
'@typescript-eslint/explicit-function-return-type': ['warn', {
46+
'allowExpressions': true,
47+
'allowTypedFunctionExpressions': true
48+
}],
49+
'@typescript-eslint/no-explicit-any': 'warn',
50+
'@typescript-eslint/ban-ts-comment': 'warn',
51+
'@typescript-eslint/prefer-nullish-coalescing': 'warn',
52+
'@typescript-eslint/prefer-optional-chain': 'warn',
53+
54+
// Import rules
55+
'import/order': [
56+
'error',
57+
{
58+
'groups': ['builtin', 'external', 'internal', ['parent', 'sibling'], 'index'],
59+
'newlines-between': 'always',
60+
'alphabetize': { 'order': 'asc', 'caseInsensitive': true }
61+
}
62+
],
63+
'import/no-duplicates': 'error',
64+
65+
// Code style
66+
'prettier/prettier': ['error', {
67+
'singleQuote': true,
68+
'trailingComma': 'all',
69+
'printWidth': 100,
70+
'tabWidth': 2
71+
}]
1972
},
2073
parserOptions: {
2174
ecmaVersion: 2018,
22-
sourceType: 'module'
75+
sourceType: 'module',
76+
project: './tsconfig.json'
77+
},
78+
settings: {
79+
'import/resolver': {
80+
'typescript': {
81+
'alwaysTryTypes': true,
82+
'project': './tsconfig.json'
83+
},
84+
'node': {
85+
'extensions': ['.js', '.jsx', '.ts', '.tsx']
86+
}
87+
}
2388
}
24-
};
89+
}

extractors/cds/tools/.prettierrc.js

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
module.exports = {
2+
semi: true,
3+
trailingComma: 'all',
4+
singleQuote: true,
5+
printWidth: 100,
6+
tabWidth: 2,
7+
endOfLine: 'auto',
8+
arrowParens: 'avoid',
9+
// Explicitly handle trailing whitespace
10+
trailingSpaces: false,
11+
};

extractors/cds/tools/autobuild.md

Lines changed: 29 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,23 @@
11
# CodeQL CDS Extractor `autobuild` Re-write Guide
22

3+
## Goals
4+
5+
The primary goals of this project are to create a more robust, well-tested, and maintainable CodeQL extractor for `.cds` files that implement [Core Data Services][CDS] ([CDS]) as part of the [Cloud Application Programming] ([CAP]) model.
6+
37
## Overview
48

5-
This document provides a guide for the multi-step process of re-writing the CodeQL extractor for CDS by using an approach based on `autobuild` rather than `index-files`. This document is meant to be a common reference and a project guide while the iterative re-write is in-progress, especially since there is more to this project than a simple re-write of the scripts that comprise CodeQL's extractor (tool) for Core Data Services (CDS). The goal of this project is to create a more robust, well tested, and maintainable extractor for CDS.
9+
This document provides a guide for the multi-step process of re-writing the CodeQL extractor for [CDS] by using an approach based on `autobuild` rather than `index-files`.
10+
11+
This document is meant to be a common reference and a project guide while the iterative re-write is in-progress, especially since there is more to this project than a simple re-write of the scripts that comprise CodeQL's extractor (tool) for [CDS].
612

713
## Challenges with the Current Extractor (using `index-files`)
814

9-
The current extractor for CDS is based on `index-files`, which has several limitations and challenges:
15+
The current extractor for [CDS] is based on `index-files`, which has several limitations and challenges:
1016

1117
1. **Testability**: The current extractor is difficult to test, and especially difficult to troubleshoot when tests fail, because the current implementation lacks unit tests and relieas heavily on integration tests that are performed in a post-commit workflow that runs via GitHub Actions, which makes it more difficult to track errors back to the source of the problem and adds significant delay to the development process.
12-
2. **Performance**: The current extractor is slow and inefficient, especially when dealing with large projects or complex CDS files. This is due to the way `index-files` processes files, which can lead to long processing times and increased resource usage. There are several performance improvements that could be made to the extractor, but they are all related to avoid work that we either do not need to do or that has already been done.
18+
2. **Performance**: The current extractor is slow and inefficient, especially when dealing with large projects or complex [CDS] files. This is due to the way `index-files` processes files, which can lead to long processing times and increased resource usage. There are several performance improvements that could be made to the extractor, but they are all related to avoid work that we either do not need to do or that has already been done.
1319

14-
- As one example of a performance problem, using the `index-files` approach means that we are provided with a list of all `.cds` files in the project and are expected to index them all, which makes sense for CodeQL (as we want our database to have a copy of every in-scope source code file) but is horribly inefficient from a CDS perspective as the CDS format allows for a single file to contain multiple CDS definitions. The extractor is expected to be able to handle this by parsing the declarative syntax of the `.cds` file in order to understand which other `.cds` files are to be imported as part of that top-level file, meaning that we are expected to avoid duplicate imports of files that are already (and only) used as library-style imports in top-level (project-level) CDS files. This is a non-trivial task, and the current extractor does not even try to parse the contents of the `.cds` files to determine which files are actually used in the project. Instead, it simply imports all `.cds` files that are found in the project, which can lead to duplicate imports and increased processing times.
20+
- As one example of a performance problem, using the `index-files` approach means that we are provided with a list of all `.cds` files in the project and are expected to index them all, which makes sense for CodeQL (as we want our database to have a copy of every in-scope source code file) but is horribly inefficient from a [CDS] perspective as the [CDS] format allows for a single file to contain multiple [CDS] definitions. The extractor is expected to be able to handle this by parsing the declarative syntax of the `.cds` file in order to understand which other `.cds` files are to be imported as part of that top-level file, meaning that we are expected to avoid duplicate imports of files that are already (and only) used as library-style imports in top-level (project-level) [CDS] files. This is a non-trivial task, and the current extractor does not even try to parse the contents of the `.cds` files to determine which files are actually used in the project. Instead, it simply imports all `.cds` files that are found in the project, which can lead to duplicate imports and increased processing times.
1521

1622
- Another example of a performance problem is that the current `index-files`-based extractor spends a lot of time installing node dependencies because it runs a `npm install` command in every "CDS project directory" that it finds, which is every directory that contains a `package.json` file and either directly contains a `.cds` file (as a sibling of the `package.json` file) or contains some subdirectory that contains either a `.cds` file or a subdirectory that contains a `.cds` file. This means that the extractor will install these dependencies in a directory that we would rather not make changes in just to be able to use a specific version of `@sap/cds` and/or `@sap/cds-dk` (the dependencies that are needed to run the extractor). This also means that if we have five project that all use the same version of `@sap/cds` and/or `@sap/cds-dk`, we will install that version five separate times in five separate locations, which is both a waste of time and creates a cleanup challenge as the install makes changes to the `package-lock.json` file in each of those five project directories (and also makes changes to the `node_modules` subdirectory of each project directory).
1723

@@ -20,18 +26,21 @@ The current extractor for CDS is based on `index-files`, which has several limit
2026

2127
## Goals for the Future Extractor (using `autobuild`)
2228

23-
The main goals for the `autobuild`-based CDS extractor are to:
29+
The main goals for the `autobuild`-based [CDS] extractor are to:
30+
31+
1. **Improve the Performance of Running the [CDS] Extractor on Large Codebases**:
32+
The performance problems with the current `index-files`-based [CDS] extractor are compounded when running the extractor on large codebases, where the duplicate import problem is magnified in large projects that make heavy use of library-style imports. The `autobuild`-based extractor will be able to avoid this problem by using a more efficient approach to parsing the `.cds` files and determining which files are actually used in the project. This will allow us to avoid duplicate imports and reduce processing times.
2433

25-
1. **Improve the Performance of Running the CDS Extractor on Large Codebases**: The performance problems with the current `index-files`-based CDS extractor are compounded when running the extractor on large codebases, where the duplicate import problem is magnified in large projects that make heavy use of library-style imports. The `autobuild`-based extractor will be able to avoid this problem by using a more efficient approach to parsing the `.cds` files and determining which files are actually used in the project. This will allow us to avoid duplicate imports and reduce processing times.
26-
2. **Improve the Testability of the CDS Extractor**: The `autobuild`-based extractor will be designed to be more testable, with a focus on unit tests and integration tests that can be run in a pre-commit workflow. This will allow us to catch errors early in the development process and make it easier to maintain the code over time. The new extractor will also be designed to be more modular, with a focus on breaking the code into smaller, more manageable pieces that can be tested independently.
34+
2. **Improve the Testability of the [CDS] Extractor**:
35+
The `autobuild`-based extractor will be designed to be more testable, with a focus on unit tests and integration tests that can be run in a pre-commit workflow. This will allow us to catch errors early in the development process and make it easier to maintain the code over time. The new extractor will also be designed to be more modular, with a focus on breaking the code into smaller, more manageable pieces that can be tested independently.
2736

2837
All other goals are secondary to and/or in support of the above two goals.
2938

3039
## Expected Technical Changes
3140

32-
- The `autobuild.ts` script/code will need to be able to determine its own list of `.cds` files to process when given a "source root" directory to be scanned (recursively) for `.cds` files and will have to maintain some form of state while determining the most efficient way to process all of the applicable CDS statements without duplicating work. This will be done by using a combination of parsing the `.cds` files and using a cache to keep track of which files have already been processed. The cache will be stored in a JSON file that will be created and updated as the extractor runs. This will allow the extractor to avoid re-processing files that have already been processed, which will improve performance and reduce resource usage.
41+
- The `autobuild.ts` script/code will need to be able to determine its own list of `.cds` files to process when given a "source root" directory to be scanned (recursively) for `.cds` files and will have to maintain some form of state while determining the most efficient way to process all of the applicable [CDS] statements without duplicating work. This will be done by using a combination of parsing the `.cds` files and using a cache to keep track of which files have already been processed. The cache will be stored in a JSON file that will be created and updated as the extractor runs. This will allow the extractor to avoid re-processing files that have already been processed, which will improve performance and reduce resource usage.
3342

34-
- Keep track of the unique set of `@sap/cds` and `@sap/cds-dk` dependency combinations that are used by any "project directory" found under the "source root" directory. Also, create a temporary directory structure for storing the `package.json`, `package-lock.json`, and `node_modules` subdirectory for each unique combination of `@sap/cds` and `@sap/cds-dk` dependencies. This will allow us to avoid installing the same version of these dependencies multiple times in different project directories, which will improve performance and reduce resource usage. The temporary directory structure will be created in a subdirectory of the "source root" directory, and will be cleaned up after the extractor has finished running. This will allow us to be much more efficient in terms of installing CDS compiler dependencies, much more explicit about which version of the CDS compiler we are using for a given (sub-)project, will allow us to avoid making changes to the `package.json` and `package-lock.json` files in the project directories, and will allow us to avoid installing the same version of these dependencies multiple times in different project directories.
43+
- Keep track of the unique set of `@sap/cds` and `@sap/cds-dk` dependency combinations that are used by any "project directory" found under the "source root" directory. Also, create a temporary directory structure for storing the `package.json`, `package-lock.json`, and `node_modules` subdirectory for each unique combination of `@sap/cds` and `@sap/cds-dk` dependencies. This will allow us to avoid installing the same version of these dependencies multiple times in different project directories, which will improve performance and reduce resource usage. The temporary directory structure will be created in a subdirectory of the "source root" directory, and will be cleaned up after the extractor has finished running. This will allow us to be much more efficient in terms of installing [CDS] compiler dependencies, much more explicit about which version of the [CDS] compiler we are using for a given (sub-)project, will allow us to avoid making changes to the `package.json` and `package-lock.json` files in the project directories, and will allow us to avoid installing the same version of these dependencies multiple times in different project directories.
3544

3645
- Use a new `autobuild.ts` script as the main entry point for the extractor's TypeScript code, meaning that the build process will compile the TypeScript code in `autobuild.ts` to JavaScript code in `autobuild.js`, which will then be run as the main entry point for the extractor. Instead of `index-files.cmd` and `index-files.sh`, we will have wrapper scripts such as `autobuild.cmd` and `autobuild.sh` that will be used to run the `autobuild.js` script in different environments (i.e. Windows and Unix-like environments).
3746

@@ -41,16 +50,24 @@ All other goals are secondary to and/or in support of the above two goals.
4150

4251
- Add unit tests for everything that can be unit tested. This will allow us to catch errors early in the development process and make it easier to maintain the code over time. We will use a combination of testing frameworks to test the extractor as part of the pre-commit build process. This will allow us to catch errors early in the development process and make it easier to maintain the code over time. Setting up such unit tests will require modifications to the `package.json` file to include the necessary dependencies and scripts for running the tests. We will also need to set up a testing framework, such as Jest or Mocha, to run the tests and report the results. To support all of this, we will create unit tests under a new `test` directory (project path would be `extractors/cds/tools/test`) that will contain all of the unit tests for the extractor. This will allow us to keep the tests organized and easy to navigate. The test directory will be organized into subdirectories based on functionality and mirroring the structure of the `src` directory. For example, if we add a `src/parsers/cdsParser.ts` file, we will also add a `test/parsers/cdsParser.test.ts` file that contains the unit tests for the `cdsParser.ts` file. This will allow us to keep the tests organized and easy to navigate.
4352

44-
## Examples of Improved CDS Parsing
53+
## Examples of Improved [CDS] Parsing
4554

4655
TODO
4756

48-
### Example 1: Parsing an `index.cds` CDS File with Multiple Definitions
57+
### Example 1: Parsing an `index.cds` [CDS] File with Multiple Definitions
4958

5059
```cds
5160
```
5261

53-
### Example 2: Parsing a `schema.cds` CDS File with Multiple Definitions
62+
### Example 2: Parsing a `schema.cds` [CDS] File with Multiple Definitions
5463

5564
```cds
5665
```
66+
67+
## References
68+
69+
[CAP]: https://cap.cloud.sap/docs/about/
70+
[CDS]: https://cap.cloud.sap/docs/cds/
71+
72+
- The [Cloud Application Programming][CAP] Model.
73+
- [Core Data Services][CDS] (CDS) in the Cloud Application Programming (CAP) Model.

0 commit comments

Comments
 (0)