-
Notifications
You must be signed in to change notification settings - Fork 612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend CLI document get
to fetch multiple documents.
#33071
Conversation
|
||
for _, docId := range parsedIds { | ||
result := client.Get(docId, fieldSet) | ||
printResult(cli, operationResult(true, document.Document{Id: docId}, service, result), true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
printResult
can return an error so this should continue returning it to the caller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took the liberty of fixing this myself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mpolden Because (in our use case) we wanted it to process all documents, even if some fail (due to not existing), I was intentionally not exiting early. We would let printResult log it and continue. There is an argument to be made that the document get was successful in this case, there was just no data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, how about this then? We print all documents but return an error if at least one read failed so that the command exits with non-zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the client returns a non-zero status, that would indicate to our runner process that it failed (and it would be retried).
I don't know if there are other non-critical errors that could get bubbled up, but perhaps it only makes sense to continue on 404's? In that case, what about making it opt-in with a flag; something like --continue-on-missing
or similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like vespa visit
is a better fit for your use case. It will also be faster. For example:
$ vespa visit --selection 'id = "id:mynamespace:music::a-head-full-of-dreams" OR id = "id:mynamespace:music::hardwired-to-self-destruct" OR id = "id:mynamespace:music::foobar"'
{"id":"id:mynamespace:music::a-head-full-of-dreams","fields":{"artist":"Coldplay","year":2015,"category_scores":{"type":"tensor<float>(cat{})","cells":{"pop":1.0,"rock":0.20000000298023224,"jazz":0.0}},"album":"A Head Full of Dreams"}}
{"id":"id:mynamespace:music::hardwired-to-self-destruct","fields":{"artist":"Metallica","year":2016,"category_scores":{"type":"tensor<float>(cat{})","cells":{"pop":0.0,"rock":1.0,"jazz":0.0}},"album":"Hardwired...To Self-Destruct"}}
You can even pipe this to directly to vespa feed
to relay the documents to another instance: vespa visit -a tenant.app.instance1 ... | vespa feed -a tenant.app.instance2 -
.
See https://docs.vespa.ai/en/vespa-cli.html#cheat-sheet and https://docs.vespa.ai/en/reference/document-select-language.html.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using visit
definitely would have made the job easier, had it worked in our use case. I don't know/understand all the details, but you can chat with @kkraune if you want to get the full rundown. I will take a stab at a PR to allow what I was proposing, and you can determine if it's worth merging at that point.
there are good cases for a multi-get, so I think it makes sense to add this. |
I confirm that this contribution is made under the terms of the license found in the root directory of this repository's source tree and that I have the authority necessary to make this contribution on behalf of its copyright owner.
We had a use-case where we needed to pull many individual documents from a Vespa instance for relay to another. The existing
document get
mechanism incurred a lot of process overhead to retrieve multiple documents, so I adapted the client to allow multiple id's to be passed and processed serially. It will loop until all specified document retrievals are completed, it does not halt early for server errors or missing documents as the original implementation did not.