-
Notifications
You must be signed in to change notification settings - Fork 2
CDS Extractor Rewrite Phase 2 : Improve Performance and Precision #195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
data-douser
wants to merge
40
commits into
advanced-security:main
Choose a base branch
from
data-douser:data-douser/cds-ts-rewrite-2
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 29 commits
Commits
Show all changes
40 commits
Select commit
Hold shift + click to select a range
5bafe3d
Refactor CDS extractor for dedicated "cds" package
data-douser e4c1ff0
Fix CDS extractor findPackageJsonDirs
data-douser 7e58207
Rename CDS extractor entrypoint and refactor args
data-douser af80a68
Add self-parser.test.ts for CDS extractor
data-douser ea19649
CDS extractor tests for compiler & packageManager
data-douser f9e41aa
Fix CDS extractor environment setup
data-douser 6412400
Improve CDS extractor logging
data-douser 009fe42
First attempt at project-aware CDS compilation
data-douser 2c68c8a
Refactor CDS extractor for dedicated "cds" package
data-douser 8e09758
Rename CDS extractor entrypoint and refactor args
data-douser 1dd464b
Add self-parser.test.ts for CDS extractor
data-douser 729dd2e
CDS extractor tests for compiler & packageManager
data-douser 0c75133
Fix CDS extractor environment setup
data-douser 09fa955
Improve CDS extractor logging
data-douser c865d94
First attempt at project-aware CDS compilation
data-douser d629d1e
Merge branch 'data-douser/cds-ts-rewrite-2' of github.com:data-douser…
data-douser 5aa2d54
Update node dependencies for CDS extractor
data-douser 86b5572
Update CDS extractor flowchart diagram
data-douser d6a99da
Merge branch 'main' into data-douser/cds-ts-rewrite-2
data-douser bc82815
Fixes CDS extractor project-aware file detection
data-douser 6315a49
Remove "--parse" from CDS compile command
data-douser bc4a2cd
Merge branch 'advanced-security:main' into data-douser/cds-ts-rewrite-2
data-douser 27743ba
Simplify CDS extractor logic and refactor
data-douser 7a05f80
Fix project-aware CDS compile file paths
data-douser af066d7
Merge branch 'advanced-security:main' into data-douser/cds-ts-rewrite-2
data-douser 0ba67d5
Fix code-scanning alerts for insecure tmp files
data-douser 0f1ac9e
Improve testing of CDS extractor graph
data-douser aaab73b
Update CDS extractor node dependencies
data-douser e4bbc1f
Implement cdsExtractorLog for consistent logging
data-douser 99923e0
Common project graph for CDS parse and compile
data-douser efc989f
Fix CdlService getImplementation file location
data-douser 9dfbe3b
Use shell-quote.quote in testCdsCommand()
data-douser fcac0f1
Fix unit test for CDS extractor
data-douser 3573995
Fix CDS extractor args validation
data-douser 1c5d011
Improve cdsExtractorLog for debugging performance
data-douser 359290e
Replace Math.random() in CDS extractor
data-douser d4dd37a
Refactor CDS extractor run modes
data-douser 90453cc
Fix CDS extractor monorepo support
data-douser aa7dac4
Replace CDS extractor autobuild.md with README.md
data-douser cc29a54
Add CDS extractor JS dist and node_modules
data-douser File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -71,3 +71,5 @@ tmp/ | |
**.testproj | ||
dbs | ||
*.cds.json | ||
.cds-extractor-cache | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,182 @@ | ||
import { | ||
buildCdsProjectDependencyGraph, | ||
compileCdsToJson, | ||
determineCdsCommand, | ||
findProjectForCdsFile, | ||
} from './src/cds'; | ||
import { CdsProjectMapWithDebugSignals } from './src/cds/parser/types'; | ||
import { runJavaScriptExtractor } from './src/codeql'; | ||
import { addCompilationDiagnostic } from './src/diagnostics'; | ||
import { configureLgtmIndexFilters, setupAndValidateEnvironment } from './src/environment'; | ||
import { cdsExtractorLog, setSourceRootDirectory } from './src/logging'; | ||
import { installDependencies } from './src/packageManager'; | ||
import { RunMode } from './src/runMode'; | ||
import { validateArguments } from './src/utils'; | ||
|
||
// Validate arguments to this script. | ||
// The first argument we pass is the expected run mode, which will be extracted from process.argv[2] | ||
// This will determine the correct minimum argument count for validation | ||
const validationResult = validateArguments(process.argv, RunMode.AUTOBUILD); | ||
if (!validationResult.isValid) { | ||
console.warn(validationResult.usageMessage); | ||
// Exit with an error code on invalid use of this script. | ||
process.exit(1); | ||
} | ||
|
||
// Get the validated and sanitized arguments | ||
const { runMode, sourceRoot } = validationResult.args!; | ||
|
||
// Initialize the unified logging system with the source root directory | ||
setSourceRootDirectory(sourceRoot); | ||
|
||
// Check for autobuild mode | ||
if (runMode === (RunMode.AUTOBUILD as string)) { | ||
cdsExtractorLog('info', 'Autobuild mode is not implemented yet.'); | ||
process.exit(1); | ||
} | ||
|
||
// Setup the environment and validate all requirements first, before changing directory | ||
// This ensures we can properly locate the CodeQL tools | ||
const { | ||
success: envSetupSuccess, | ||
errorMessages, | ||
codeqlExePath, | ||
autobuildScriptPath, | ||
platformInfo, | ||
} = setupAndValidateEnvironment(sourceRoot); | ||
|
||
if (!envSetupSuccess) { | ||
const codeqlExe = platformInfo.isWindows ? 'codeql.exe' : 'codeql'; | ||
cdsExtractorLog( | ||
'warn', | ||
`'${codeqlExe} database index-files --language cds' terminated early due to: ${errorMessages.join( | ||
', ', | ||
)}.`, | ||
); | ||
// Exit with an error code when environment setup fails. | ||
process.exit(1); | ||
} | ||
|
||
// Force this script, and any process it spawns, to use the project (source) root | ||
// directory as the current working directory. | ||
process.chdir(sourceRoot); | ||
|
||
cdsExtractorLog( | ||
'info', | ||
`CodeQL CDS extractor using run mode '${runMode}' for scan of project source root directory '${sourceRoot}'.`, | ||
); | ||
|
||
// Using the new project-aware approach to find CDS projects and their dependencies | ||
cdsExtractorLog('info', 'Detecting CDS projects and analyzing their structure...'); | ||
|
||
// Build the project dependency graph using the project-aware parser | ||
// Pass the script directory (__dirname) to support debug-parser mode internally | ||
const projectMap = buildCdsProjectDependencyGraph(sourceRoot, runMode, __dirname); | ||
|
||
// Cast to the interface with debug signals to properly handle debug mode | ||
const typedProjectMap = projectMap as CdsProjectMapWithDebugSignals; | ||
|
||
// Check if we're in debug-parser mode and should exit (based on signals from buildCdsProjectDependencyGraph) | ||
if (typedProjectMap.__debugParserSuccess) { | ||
cdsExtractorLog('info', 'Debug parser mode completed successfully.'); | ||
process.exit(0); | ||
} else if (typedProjectMap.__debugParserFailure) { | ||
cdsExtractorLog('warn', 'No CDS projects found. Cannot generate debug information.'); | ||
process.exit(1); | ||
} | ||
|
||
// Install dependencies of discovered CAP/CDS projects | ||
cdsExtractorLog( | ||
'info', | ||
'Ensuring dependencies are installed in cache for required CDS compiler versions...', | ||
); | ||
const projectCacheDirMap = installDependencies(projectMap, sourceRoot, codeqlExePath); | ||
|
||
const cdsFilePathsToProcess: string[] = []; | ||
|
||
cdsExtractorLog('info', 'Extracting CDS files from discovered projects...'); | ||
|
||
// Use the project map to collect all `.cds` files from each project. | ||
// We want to "extract" all `.cds` files from all projects so that we have a copy | ||
// of each `.cds` source file in the CodeQL database. | ||
for (const [, project] of projectMap.entries()) { | ||
cdsFilePathsToProcess.push(...project.cdsFiles); | ||
} | ||
|
||
cdsExtractorLog('info', 'Processing CDS files to JSON ...'); | ||
|
||
// Collect files that need compilation, handling project-level compilation | ||
const cdsFilesToCompile: string[] = []; | ||
const projectsForProjectLevelCompilation = new Set<string>(); | ||
|
||
for (const [projectDir, project] of projectMap.entries()) { | ||
if (project.cdsFilesToCompile.includes('__PROJECT_LEVEL_COMPILATION__')) { | ||
// This project needs project-level compilation | ||
projectsForProjectLevelCompilation.add(projectDir); | ||
// We'll only compile one file per project to trigger project-level compilation | ||
// Use the first CDS file as a representative | ||
if (project.cdsFiles.length > 0) { | ||
cdsFilesToCompile.push(project.cdsFiles[0]); | ||
} | ||
} else { | ||
// Normal individual file compilation | ||
cdsFilesToCompile.push(...project.cdsFilesToCompile); | ||
} | ||
} | ||
|
||
cdsExtractorLog( | ||
'info', | ||
`Found ${cdsFilePathsToProcess.length} total CDS files, ${cdsFilesToCompile.length} files to compile (${projectsForProjectLevelCompilation.size} project-level compilations)`, | ||
); | ||
|
||
// Evaluate each `.cds` source file that should be compiled to JSON. | ||
for (const rawCdsFilePath of cdsFilesToCompile) { | ||
try { | ||
// Find which project this CDS file belongs to, to use the correct cache directory | ||
const projectDir = findProjectForCdsFile(rawCdsFilePath, sourceRoot, projectMap); | ||
const cacheDir = projectDir ? projectCacheDirMap.get(projectDir) : undefined; | ||
|
||
// Determine the CDS command to use based on the cache directory for this specific file | ||
const cdsCommand = determineCdsCommand(cacheDir); | ||
|
||
// Use resolved path directly instead of passing through getArg | ||
// Pass the project dependency information to enable project-aware compilation | ||
const compilationResult = compileCdsToJson( | ||
rawCdsFilePath, | ||
sourceRoot, | ||
cdsCommand, | ||
cacheDir, | ||
projectMap, | ||
projectDir, | ||
); | ||
|
||
if (!compilationResult.success && compilationResult.message) { | ||
cdsExtractorLog( | ||
'error', | ||
`adding diagnostic for source file=${rawCdsFilePath} : ${compilationResult.message} ...`, | ||
); | ||
addCompilationDiagnostic(rawCdsFilePath, compilationResult.message, codeqlExePath); | ||
} | ||
} catch (errorMessage) { | ||
cdsExtractorLog( | ||
'error', | ||
`adding diagnostic for source file=${rawCdsFilePath} : ${String(errorMessage)} ...`, | ||
); | ||
addCompilationDiagnostic(rawCdsFilePath, String(errorMessage), codeqlExePath); | ||
} | ||
} | ||
|
||
// Configure the "LGTM" index filters for proper extraction. | ||
configureLgtmIndexFilters(); | ||
|
||
// Run CodeQL's JavaScript extractor to process the compiled JSON files. | ||
const extractorResult = runJavaScriptExtractor(sourceRoot, autobuildScriptPath, codeqlExePath); | ||
if (!extractorResult.success && extractorResult.error) { | ||
cdsExtractorLog('error', `Error running JavaScript extractor: ${extractorResult.error}`); | ||
} | ||
|
||
// Use the `cds-extractor.js` name in the log message as that is the name of the script | ||
// that is actually run by the `codeql database index-files` command. This TypeScript | ||
// file is where the code/logic is edited/implemented, but the runnable script is | ||
// generated by the TypeScript compiler and is named `cds-extractor.js`. | ||
console.log(`Completed run of cds-extractor.js script for CDS extractor.`); |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why
RunMode.AUTOBUILD
as the expected mode here? I would have expectedINDEX_FILES
to be the default.I don't think it matters in practice because the
validateArguments
function only seems to use it to validate a minimum number of arguments - which I would expect it to do by checking the actual run mode.