Updating embeddings of changed documents #257

bilogic · 2025-08-10T11:28:23Z

bilogic
Aug 10, 2025

I have fed a bunch of documents into my RAG agent, but there are now changes to the documents which I list using git
Is there an easy way to CRUD the embeddings without starting from scratch?

Answered by ilvalerione

Aug 11, 2025

If there are no changes you should not process a specific file with data loaders. It's an application problem.

I recommend to look at the RAG implementation with the following approach:

Execute the first data loading pipeline to process all available files, articles, etc, and use $rag->addDocuments(...) for the initial population of the vector store.
When new files are uploaded, you should call $rag->reindexBySource(...) if the file is an update of a previously processed file it will update the documents into the vecotr store. If it is a new file the documents will be added as new.
If a file that was previously processed into the vector store is deleted, you can call $rag->resolveVe…

View full answer

ilvalerione · 2025-08-10T15:35:08Z

ilvalerione
Aug 10, 2025
Maintainer

Neuron store documents into vector store with some additional fields like source Name and sourceType.

These fields can be used to remember which file or document the chunks come from.

You can automatically call for a reindex in your RAG Agent:

https://docs.neuron-ai.dev/components/data-loader#reindex-knowledge-source

5 replies

bilogic Aug 11, 2025
Author

thanks, from what I can tell here https://github.com/inspector-apm/neuron-ai/blob/36eafe9541344107ea7766e5b0989dad91605ac1/src/RAG/RAG.php#L215

reindexBySource is not making incremental changes (I think it sends every document again to them LLM), or are you saying that I should pass in only new and changed documents?

ilvalerione Aug 11, 2025
Maintainer

You can pass a list of documents. Those with the same sourceType and sourceName will be updated, others will be stored as new documents.

bilogic Aug 11, 2025
Author

Hmm perhaps let me elaborate, I have 4 types of documents:

a. New documents, passed in, stored as new documents
b. Updated documents, passed in, embeddings will be updated
c. Deleted documents (some old information might be wrong now)
d. Unchanged documents

I'm unclear how C and D are to be handled. Basically, if there are no changes, then I do not want to call the LLM API.

How do I tell NeuronAI about unchanged documents so that the LLM API is not called?
If the answer is don't pass it in, then how do I tell NeuronAI some documents have been deleted and to delete their embedding?
If I pass in unchanged documents, NeuronAI seems to be reindexing them, which I will ask, is it necessary?

ilvalerione Aug 11, 2025
Maintainer

If there are no changes you should not process a specific file with data loaders. It's an application problem.

I recommend to look at the RAG implementation with the following approach:

Execute the first data loading pipeline to process all available files, articles, etc, and use $rag->addDocuments(...) for the initial population of the vector store.
When new files are uploaded, you should call $rag->reindexBySource(...) if the file is an update of a previously processed file it will update the documents into the vecotr store. If it is a new file the documents will be added as new.
If a file that was previously processed into the vector store is deleted, you can call $rag->resolveVectorStore()->deleteBySource(...) to remove all documents associated to the deleted file.

Answer selected by bilogic

bilogic Aug 11, 2025
Author

wonderful thanks!

Updating embeddings of changed documents #257

Uh oh!

Uh oh!

bilogic Aug 10, 2025

Replies: 1 comment · 5 replies

Uh oh!

ilvalerione Aug 10, 2025 Maintainer

Uh oh!

Uh oh!

bilogic Aug 11, 2025 Author

Uh oh!

ilvalerione Aug 11, 2025 Maintainer

Uh oh!

Uh oh!

bilogic Aug 11, 2025 Author

Uh oh!

ilvalerione Aug 11, 2025 Maintainer

Uh oh!

bilogic Aug 11, 2025 Author

bilogic
Aug 10, 2025

Replies: 1 comment 5 replies

ilvalerione
Aug 10, 2025
Maintainer

bilogic Aug 11, 2025
Author

ilvalerione Aug 11, 2025
Maintainer

bilogic Aug 11, 2025
Author

ilvalerione Aug 11, 2025
Maintainer

bilogic Aug 11, 2025
Author