I wanted to share a few ideas, and hope they could be useful:
Since octocode relying on a LanceDB database it seems this addition introduces the need to re-sync or re-index changes, ideally touching re-indexing the changed parts of the code only.
One possible approach could be to leverage Git itself to track the last indexed state. For example, a dedicated "sister" technical/octocode branch could be maintained for each branch (e.g., octocode-main corresponding to main). This branch would store the repository code state at the moment of the last successful DB indexing. While the indexed DB should probably be ignored with git, and likely could be used to store all the information across branches.
With this setup, the system could compute a diff between the current branch and the octocode branch, allowing it to identify only the modified portions of the codebase that require re-indexing.
This approach also presents the ability to capture per-commit graph versioning and build & traverse a unified graph across branches that can be stored as stated before in a dedicated singular database for all the branches.
If combined with an AST-aware diff tool, we could extract semantic code changes rather than raw text diffs and update the index more precisely. For example: https://github.com/afnanenayet/diffsitter
This might enable incremental graph updates instead of full re-indexing.
I wanted to share a few ideas, and hope they could be useful:
Since octocode relying on a LanceDB database it seems this addition introduces the need to re-sync or re-index changes, ideally touching re-indexing the changed parts of the code only.
One possible approach could be to leverage Git itself to track the last indexed state. For example, a dedicated "sister" technical/octocode branch could be maintained for each branch (e.g., octocode-main corresponding to main). This branch would store the repository code state at the moment of the last successful DB indexing. While the indexed DB should probably be ignored with git, and likely could be used to store all the information across branches.
With this setup, the system could compute a diff between the current branch and the octocode branch, allowing it to identify only the modified portions of the codebase that require re-indexing.
This approach also presents the ability to capture per-commit graph versioning and build & traverse a unified graph across branches that can be stored as stated before in a dedicated singular database for all the branches.
If combined with an AST-aware diff tool, we could extract semantic code changes rather than raw text diffs and update the index more precisely. For example: https://github.com/afnanenayet/diffsitter
This might enable incremental graph updates instead of full re-indexing.