Skip to content

Idea: Re-indexing changes #35

Description

@qdrddr

I wanted to share a few ideas, and hope they could be useful:

Since octocode relying on a LanceDB database it seems this addition introduces the need to re-sync or re-index changes, ideally touching re-indexing the changed parts of the code only.

One possible approach could be to leverage Git itself to track the last indexed state. For example, a dedicated "sister" technical/octocode branch could be maintained for each branch (e.g., octocode-main corresponding to main). This branch would store the repository code state at the moment of the last successful DB indexing. While the indexed DB should probably be ignored with git, and likely could be used to store all the information across branches.

With this setup, the system could compute a diff between the current branch and the octocode branch, allowing it to identify only the modified portions of the codebase that require re-indexing.

This approach also presents the ability to capture per-commit graph versioning and build & traverse a unified graph across branches that can be stored as stated before in a dedicated singular database for all the branches.

If combined with an AST-aware diff tool, we could extract semantic code changes rather than raw text diffs and update the index more precisely. For example: https://github.com/afnanenayet/diffsitter

This might enable incremental graph updates instead of full re-indexing.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request
No fields configured for Feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions