Skip to content

Nhse o34 leveled.i433 enhancequery#51

Merged
martinsumner merged 42 commits intoopenriak-3.4from
nhse-o34-leveled.i433-enhancequery
Nov 7, 2025
Merged

Nhse o34 leveled.i433 enhancequery#51
martinsumner merged 42 commits intoopenriak-3.4from
nhse-o34-leveled.i433-enhancequery

Conversation

@martinsumner
Copy link
Contributor

New Query API for Riak, intended to replace the existing use of secondary indexes.

The API will be usable via HTTP only in this release, and only supported in the Erlang HTTP client - OpenRiak/riak-erlang-http-client#2.

Outstanding work is documentation related:

  • Provide a much simpler example, and offer it first;
  • Add performance information based on test rig and population sample data;
  • Add examples for PUT (i.e. simple examples of how to update indexes).

Riak Test.

Add support for new query API.

The riak_kv_query module allows a query to be constructed with validation checks.

The riak_kv_query_server is a gen_Server for managing a query.  The coverage_fsm behaviour is not used, to aid future migration from the deprecated fsm.

There is a riak_kv_query_buffer added to replace the use of riak_kv_fold_buffer - which assumes the accumulator is a list.
Resolves OTP 24 dialyzer complaints after clarifying specs in backend behaviour
upstream meck has updated master, and no longer supports OTP < 25
Have both vnode and node stats (as with other operations)
Add HTTP API for Query - allowing for PSOT of JSON with JSON response.
The keys may be read in every oder - the keys are not ordered with the term.

Likewise with term_with_keys as the term may now be extracted from within the term (i.e. a projected attribute)
Ensures that the message from the query server back to the client is a binary reference pass not a copy.
The de-duplication of keys using lists:umerge/2 does not scale effectively,  Now, beyond a limit a temporary ets table is used instead for de-duplication.

Still, the performance was slower than current 2i for large key sets > 10K.  This is as riak_kv_query_server is the bottleneck - it cannot ACK batches of keys whilst it is hanlding previous batches.  In contrast the existing 2i simply forwards batches immediately - so that any delay processing is on the riak_client process queue, not on the query coordinator.

To help with this performance issue, the default batch size is increased, and also jittered to help avoid all vnode workers sending concurrently.

The use of an ets table requires every key to be wrapped in a tuple.  This is done at source (in the buffer), to avoid a further delay on the query server running a lists:map/2 function.  The JSON encoder will handle the tuple wrapping of the key (although it could be efficiently removed in the ets case using a select statement rather than tab2list.
Both keys and term_with_keys are now sorted and de-duplicated, and using a consistent code path.
Add unnecessary noise to change
Using the character not the string speeds up iolist_to_binary
Play around to resolve markdown parsing issue
@martinsumner martinsumner self-assigned this Sep 3, 2025
@martinsumner martinsumner moved this to Ready in OpenRiak 3.4 Sep 3, 2025
@martinsumner martinsumner moved this from Ready to In progress in OpenRiak 3.4 Sep 3, 2025
martinsumner added a commit that referenced this pull request Oct 27, 2025
Further summary pages to describe best practice for Riak 3.4.

Where possible, reference to specific commands is avoided .  Any external product is only referenced when it is known to have been tested.

The aim is to prefer accuracy and maintainability of documentation, at the expense completeness.  Brevity is preferred to providing detailed instructions to inexpert users.

It is hoped that seven pages may provide a guide to a legacy-free description of Riak 3.4:
- InitialDesignDecisions (#61);
- InstallAndStartGuide;
- BuildAndScaleClusterGuide;
- ObjectAPI (TODO);
- QueryAPI (#51);
- NextGenREPL - GettingStarted;
- OperationsGuide (TODO).
As suggested in review.

TODO left to finish performance section.
@martinsumner martinsumner marked this pull request as ready for review November 5, 2025 20:10
Updated RHC is not required internally within riak_kv - just by riak_test.
@martinsumner martinsumner moved this from In progress to In review in OpenRiak 3.4 Nov 5, 2025
@martinsumner martinsumner merged commit a3333b8 into openriak-3.4 Nov 7, 2025
2 checks passed
@github-project-automation github-project-automation bot moved this from In review to Done in OpenRiak 3.4 Nov 7, 2025
@martinsumner martinsumner deleted the nhse-o34-leveled.i433-enhancequery branch November 7, 2025 19:23
amitgaru pushed a commit to amitgaru/riak_kv that referenced this pull request Jan 22, 2026
The default has been recommended for several years, and major customers have used it.  However, the recommendation was only made via release notes so may have been missed by some.

When the change was first made, there was some nervousness about the potential scope of the change and its irreversability - hence why it was not immediately the default.  However, sufficient time has passed to put-aside those concerns.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants