feat: submodule discovery#23
Conversation
There was a problem hiding this comment.
Pull request overview
Adds submodule discovery so repositories declared in .gitmodules can be detected and included in the extension’s repo set (including nested submodules).
Changes:
- Add
DataSource.getSubmodules()to enumerate initialized submodules from.gitmodules. - Extend
RepoManagerto discover submodules when adding repos, on startup, and when.gitmoduleschanges. - Add Vitest coverage for
getSubmodules.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| tests/backend/dataSource.submodules.test.ts | New tests that create temp git repos and validate getSubmodules behavior. |
| src/repoManager.ts | Hooks submodule discovery into repo addition, startup, and watcher event handling. |
| src/dataSource.ts | Implements .gitmodules reading/parsing and filters to initialized submodule repos. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| fs.readFile(path.join(repo, ".gitmodules"), { encoding: "utf8" }, async (err, data) => { | ||
| const submodules: string[] = []; | ||
| if (err) { | ||
| resolve(submodules); | ||
| return; | ||
| } | ||
|
|
||
| const lines = data.split(eolRegex); | ||
| const sectionRegex = /^\s*\[.*\]\s*$/; | ||
| const submoduleSectionRegex = /^\s*\[submodule "([^"]+)"\]\s*$/; | ||
| const pathRegex = /^\s*path\s*=\s*(.*)$/; | ||
| let inSubmoduleSection = false; | ||
|
|
||
| for (let i = 0; i < lines.length; i++) { | ||
| const line = lines[i]; | ||
| if (sectionRegex.test(line)) { | ||
| inSubmoduleSection = submoduleSectionRegex.test(line); | ||
| continue; | ||
| } | ||
|
|
||
| if (!inSubmoduleSection) continue; | ||
|
|
||
| const match = line.match(pathRegex); | ||
| if (match === null) continue; | ||
|
|
||
| const resolvedPath = path.resolve(repo, match[1].trim()); | ||
| const relativeToRepo = path.relative(repo, resolvedPath); | ||
| // Ignore entries that escape the repository root. | ||
| if (relativeToRepo.startsWith("..") || path.isAbsolute(relativeToRepo)) continue; | ||
|
|
||
| const submodulePath = getPathFromStr(resolvedPath); | ||
| if (submodules.includes(submodulePath)) continue; | ||
|
|
||
| if (await this.isGitRepository(submodulePath)) { | ||
| submodules.push(submodulePath); | ||
| } | ||
| } | ||
|
|
||
| resolve(submodules); | ||
| }); |
There was a problem hiding this comment.
getSubmodules manually parses .gitmodules using regexes. .gitmodules uses git-config syntax (quoting/escaping, comments, multi-line values), so this parsing can mis-handle valid files (e.g., quoted path = "foo bar"). Consider querying paths via git config -f .gitmodules --get-regexp (or similar) and parsing git’s output instead of re-implementing config parsing.
| fs.readFile(path.join(repo, ".gitmodules"), { encoding: "utf8" }, async (err, data) => { | |
| const submodules: string[] = []; | |
| if (err) { | |
| resolve(submodules); | |
| return; | |
| } | |
| const lines = data.split(eolRegex); | |
| const sectionRegex = /^\s*\[.*\]\s*$/; | |
| const submoduleSectionRegex = /^\s*\[submodule "([^"]+)"\]\s*$/; | |
| const pathRegex = /^\s*path\s*=\s*(.*)$/; | |
| let inSubmoduleSection = false; | |
| for (let i = 0; i < lines.length; i++) { | |
| const line = lines[i]; | |
| if (sectionRegex.test(line)) { | |
| inSubmoduleSection = submoduleSectionRegex.test(line); | |
| continue; | |
| } | |
| if (!inSubmoduleSection) continue; | |
| const match = line.match(pathRegex); | |
| if (match === null) continue; | |
| const resolvedPath = path.resolve(repo, match[1].trim()); | |
| const relativeToRepo = path.relative(repo, resolvedPath); | |
| // Ignore entries that escape the repository root. | |
| if (relativeToRepo.startsWith("..") || path.isAbsolute(relativeToRepo)) continue; | |
| const submodulePath = getPathFromStr(resolvedPath); | |
| if (submodules.includes(submodulePath)) continue; | |
| if (await this.isGitRepository(submodulePath)) { | |
| submodules.push(submodulePath); | |
| } | |
| } | |
| resolve(submodules); | |
| }); | |
| cp.execFile( | |
| "git", | |
| ["-C", repo, "config", "-f", ".gitmodules", "--get-regexp", "^submodule\\..*\\.path$"], | |
| async (err, stdout, _stderr) => { | |
| const submodules: string[] = []; | |
| if (err) { | |
| // If `.gitmodules` does not exist or there are no matching entries, | |
| // resolve with an empty list to preserve existing behavior. | |
| resolve(submodules); | |
| return; | |
| } | |
| const lines = stdout.split(eolRegex); | |
| for (const line of lines) { | |
| if (!line.trim()) continue; | |
| const firstSpace = line.indexOf(" "); | |
| if (firstSpace === -1) continue; | |
| const value = line.substring(firstSpace + 1).trim(); | |
| if (!value) continue; | |
| const resolvedPath = path.resolve(repo, value); | |
| const relativeToRepo = path.relative(repo, resolvedPath); | |
| // Ignore entries that escape the repository root. | |
| if (relativeToRepo.startsWith("..") || path.isAbsolute(relativeToRepo)) continue; | |
| const submodulePath = getPathFromStr(resolvedPath); | |
| if (submodules.includes(submodulePath)) continue; | |
| if (await this.isGitRepository(submodulePath)) { | |
| submodules.push(submodulePath); | |
| } | |
| } | |
| resolve(submodules); | |
| } | |
| ); |
| @@ -298,7 +329,9 @@ export class RepoManager { | |||
| let path, | |||
| changes = false; | |||
| while ((path = this.changeEventPaths.shift())) { | |||
| if (!(await doesPathExist(path))) { | |||
| if (path.endsWith("/.gitmodules")) { | |||
| if (await this.checkReposForNewSubmodules()) changes = true; | |||
| } else if (!(await doesPathExist(path))) { | |||
There was a problem hiding this comment.
On .gitmodules create/change events, processCreateEvents / processChangeEvents call checkReposForNewSubmodules(), which scans all known repos and runs submodule discovery for each. Since the watcher path is the specific .gitmodules file, this can be much more expensive than necessary in large workspaces. Prefer deriving the repo root from the event path (e.g., dirname(path)), and only scanning that repo (plus optionally its known descendants).
| afterAll(() => { | ||
| fs.rmSync(parentRepo, { recursive: true, force: true }); | ||
| fs.rmSync(submoduleRepo, { recursive: true, force: true }); | ||
| fs.rmSync(standaloneRepo, { recursive: true, force: true }); | ||
| }); |
There was a problem hiding this comment.
afterAll unconditionally calls fs.rmSync on parentRepo/submoduleRepo/standaloneRepo. If beforeAll throws before these variables are assigned, Vitest will still run afterAll and this teardown will throw (masking the real failure). Guard the cleanup (e.g., check the variable is a non-empty string / path exists) or initialize the vars to empty strings and conditionalize the rmSync calls.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| this.execGit( | ||
| 'config -f .gitmodules --get-regexp "^submodule\\..*\\.path$"', | ||
| repo, | ||
| async (err, stdout) => { |
There was a problem hiding this comment.
getSubmodules wraps the entire git command in single quotes, which makes the shell pass it to git as a single argument (e.g., git 'config -f ...') and will fail (and is especially problematic on Windows cmd.exe). Build the command without wrapping quotes, or preferably switch to spawnGit/execFile with an args array so the --get-regexp pattern is passed as its own argument without shell-escaping issues.
| import * as path from "node:path"; | ||
| import * as cp from "node:child_process"; |
There was a problem hiding this comment.
Importing node:path as path introduces identifier shadowing with existing parameters named path (e.g., isGitRepository(path: string)). Consider renaming those parameters (e.g., repoPath/dir) to avoid confusion and make future use of the path module inside those methods possible.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| describe("getSubmodules", () => { | ||
| it("returns initialized submodules declared in .gitmodules", async () => { | ||
| const dataSource = new DataSource(); | ||
|
|
||
| const submodules = await dataSource.getSubmodules(parentRepo); | ||
|
|
||
| expect(submodules).toEqual([path.join(parentRepo, "modules/child").replace(/\\/g, "/")]); | ||
| }); |
There was a problem hiding this comment.
PR description mentions nested submodules, but the test suite here only covers a single-level submodule and the no-.gitmodules case. Add coverage for nested submodules (e.g. parent -> child -> grandchild) and/or an uninitialized/missing submodule entry to ensure discovery behaves correctly across the scenarios this feature targets.
| const resolvedPath = path.resolve(repo, rawPath); | ||
| const relativeToRepo = path.relative(repo, resolvedPath); | ||
| // Ignore entries that escape the repository root. | ||
| if (relativeToRepo.startsWith("..") || path.isAbsolute(relativeToRepo)) continue; |
There was a problem hiding this comment.
The repo-escape guard is overly broad: relativeToRepo.startsWith("..") will also reject legitimate submodule paths whose first path segment merely starts with .. (e.g. ..my-submodule). Consider tightening the check to only reject .. as an actual path segment (e.g. relativeToRepo === ".." or relativeToRepo.startsWith(".." + path.sep)), while keeping the cross-drive absolute-path guard.
| if (relativeToRepo.startsWith("..") || path.isAbsolute(relativeToRepo)) continue; | |
| if ( | |
| relativeToRepo === ".." || | |
| relativeToRepo.startsWith(".." + path.sep) || | |
| path.isAbsolute(relativeToRepo) | |
| ) { | |
| continue; | |
| } |
| if (isRepo) { | ||
| this.addRepo(directory); | ||
| resolve(true); | ||
| this.addRepo(directory).then(resolve); |
There was a problem hiding this comment.
Any reason to change this logic?
Adds submodule discovery so repositories declared in .gitmodules can be detected and included in the extension’s repo set (including nested submodules).