Extend CLI `document get` to fetch multiple documents. #33071

wix-mikej · 2025-01-03T01:35:27Z

I confirm that this contribution is made under the terms of the license found in the root directory of this repository's source tree and that I have the authority necessary to make this contribution on behalf of its copyright owner.

We had a use-case where we needed to pull many individual documents from a Vespa instance for relay to another. The existing document get mechanism incurred a lot of process overhead to retrieve multiple documents, so I adapted the client to allow multiple id's to be passed and processed serially. It will loop until all specified document retrievals are completed, it does not halt early for server errors or missing documents as the original implementation did not.

mpolden · 2025-01-06T08:50:22Z

client/go/internal/cli/cmd/document.go

+
+	for _, docId := range parsedIds {
+		result := client.Get(docId, fieldSet)
+		printResult(cli, operationResult(true, document.Document{Id: docId}, service, result), true)


printResult can return an error so this should continue returning it to the caller.

I took the liberty of fixing this myself.

@mpolden Because (in our use case) we wanted it to process all documents, even if some fail (due to not existing), I was intentionally not exiting early. We would let printResult log it and continue. There is an argument to be made that the document get was successful in this case, there was just no data.

Right, how about this then? We print all documents but return an error if at least one read failed so that the command exits with non-zero.

If the client returns a non-zero status, that would indicate to our runner process that it failed (and it would be retried).

I don't know if there are other non-critical errors that could get bubbled up, but perhaps it only makes sense to continue on 404's? In that case, what about making it opt-in with a flag; something like --continue-on-missing or similar?

It sounds like vespa visit is a better fit for your use case. It will also be faster. For example:

$ vespa visit --selection 'id = "id:mynamespace:music::a-head-full-of-dreams" OR id = "id:mynamespace:music::hardwired-to-self-destruct" OR id = "id:mynamespace:music::foobar"' {"id":"id:mynamespace:music::a-head-full-of-dreams","fields":{"artist":"Coldplay","year":2015,"category_scores":{"type":"tensor<float>(cat{})","cells":{"pop":1.0,"rock":0.20000000298023224,"jazz":0.0}},"album":"A Head Full of Dreams"}} {"id":"id:mynamespace:music::hardwired-to-self-destruct","fields":{"artist":"Metallica","year":2016,"category_scores":{"type":"tensor<float>(cat{})","cells":{"pop":0.0,"rock":1.0,"jazz":0.0}},"album":"Hardwired...To Self-Destruct"}}

You can even pipe this to directly to vespa feed to relay the documents to another instance: vespa visit -a tenant.app.instance1 ... | vespa feed -a tenant.app.instance2 -.

See https://docs.vespa.ai/en/vespa-cli.html#cheat-sheet and https://docs.vespa.ai/en/reference/document-select-language.html.

Using visit definitely would have made the job easier, had it worked in our use case. I don't know/understand all the details, but you can chat with @kkraune if you want to get the full rundown. I will take a stab at a PR to allow what I was proposing, and you can determine if it's worth merging at that point.

kkraune · 2025-01-09T13:04:35Z

there are good cases for a multi-get, so I think it makes sense to add this.

Extend CLI document get to fetch multiple documents.

a223915

kkraune requested a review from mpolden January 3, 2025 08:06

mpolden reviewed Jan 6, 2025

View reviewed changes

Return error for get operation

6f4d95e

mpolden approved these changes Jan 6, 2025

View reviewed changes

mpolden merged commit f876534 into vespa-engine:master Jan 6, 2025
1 check passed

wix-mikej deleted the multi-get branch January 8, 2025 19:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend CLI `document get` to fetch multiple documents. #33071

Extend CLI `document get` to fetch multiple documents. #33071

wix-mikej commented Jan 3, 2025

mpolden Jan 6, 2025

mpolden Jan 6, 2025

wix-mikej Jan 6, 2025

mpolden Jan 7, 2025

wix-mikej Jan 7, 2025

mpolden Jan 8, 2025 •

edited

Loading

wix-mikej Jan 8, 2025

kkraune commented Jan 9, 2025

Extend CLI document get to fetch multiple documents. #33071

Extend CLI document get to fetch multiple documents. #33071

Conversation

wix-mikej commented Jan 3, 2025

mpolden Jan 6, 2025

Choose a reason for hiding this comment

mpolden Jan 6, 2025

Choose a reason for hiding this comment

wix-mikej Jan 6, 2025

Choose a reason for hiding this comment

mpolden Jan 7, 2025

Choose a reason for hiding this comment

wix-mikej Jan 7, 2025

Choose a reason for hiding this comment

mpolden Jan 8, 2025 • edited Loading

Choose a reason for hiding this comment

wix-mikej Jan 8, 2025

Choose a reason for hiding this comment

kkraune commented Jan 9, 2025

Extend CLI `document get` to fetch multiple documents. #33071

Extend CLI `document get` to fetch multiple documents. #33071

mpolden Jan 8, 2025 •

edited

Loading