Skip to content

Group experimental feature flags and expose them via /version#4693

Open
muhamadazmy wants to merge 1 commit intomainfrom
pr4693
Open

Group experimental feature flags and expose them via /version#4693
muhamadazmy wants to merge 1 commit intomainfrom
pr4693

Conversation

@muhamadazmy
Copy link
Copy Markdown
Contributor

Group experimental feature flags and expose them via /version

Summary:
Move the scattered experimental_enable_* config fields on
CommonOptions into a dedicated Experimental struct, generated by an
experimental! macro from a single list of feature names, and expose
their enabled/disabled state through the admin /version endpoint.

Adding a new flag is now a one-line change: declare the identifier in
the experimental! { ... } block and the macro emits the config field,
is_<name>_enabled() / set_<name>() accessors, and the /version
entry. Naming is split between the two surfaces:

  • the on-disk config / JSON schema keeps the explicit
    experimental_enable_<name> field, so existing configurations and
    the schema are unchanged;
  • /version reports the bare feature name (no experimental_enable_
    prefix) to keep the API output compact.

Example /version response:

    {
      "version": "1.7.0-dev",
      "min_admin_api_version": 2,
      "max_admin_api_version": 4,
      "ingress_endpoint": "http://127.0.0.1:8080/",
      "features": {
        "vqueues": false,
        "protocol_v7": true,
        "invoker_yield": false
      }
    }

Note: the reason this is done this way is because serde does not
allow dyanmic field renames

Summary:
Move the scattered `experimental_enable_*` config fields on
`CommonOptions` into a dedicated `Experimental` struct, generated by an
`experimental!` macro from a single list of feature names, and expose
their enabled/disabled state through the admin `/version` endpoint.

Adding a new flag is now a one-line change: declare the identifier in
the `experimental! { ... }` block and the macro emits the config field,
`is_<name>_enabled()` / `set_<name>()` accessors, and the `/version`
entry. Naming is split between the two surfaces:

- the on-disk config / JSON schema keeps the explicit
  `experimental_enable_<name>` field, so existing configurations and
  the schema are unchanged;
- `/version` reports the bare feature name (no `experimental_enable_`
  prefix) to keep the API output compact.

Example `/version` response:
```
    {
      "version": "1.7.0-dev",
      "min_admin_api_version": 2,
      "max_admin_api_version": 4,
      "ingress_endpoint": "http://127.0.0.1:8080/",
      "features": {
        "vqueues": false,
        "protocol_v7": true,
        "invoker_yield": false
      }
    }
```

Note: the reason this is done this way is because serde does not
allow dyanmic field renames
@muhamadazmy
Copy link
Copy Markdown
Contributor Author

A bit of context here:

I initially grouped all experimental features under an Experimental struct. However, that required renaming the flags to avoid names like common.experimental.experimental_<feat>, which felt redundant.

As a result, the configuration keys no longer use the experimental_enable_ prefix. Unfortunately, serde doesn’t support dynamic renaming via a function, so I couldn’t handle this more elegantly.

At the same time, I wanted the /version endpoint to return only the feature names without any prefixes. To achieve this, I ended up using a macro to populate the feature list.

The macro ensures that any new feature flag is automatically included in the returned feature list with the correct name, while the config still exposes properly prefixed fields and convenient getters for use in code.

That said, I’m starting to wonder if this approach might be overengineered.

@muhamadazmy muhamadazmy requested a review from tillrohrmann May 5, 2026 11:23
@muhamadazmy muhamadazmy marked this pull request as ready for review May 5, 2026 11:27
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

Test Results

  7 files   - 1    7 suites   - 1   4m 31s ⏱️ + 2m 27s
 49 tests  - 1   49 ✅  - 1  0 💤 ±0  0 ❌ ±0 
210 runs   - 8  210 ✅  - 8  0 💤 ±0  0 ❌ ±0 

Results for commit dd5861e. ± Comparison against base commit 7bc1cb3.

This pull request removes 5 and adds 4 tests. Note that renamed tests count towards both.
dev.restate.sdktesting.tests.Custom ‑ run(CustomTestConfig, URI, URI)[1]
dev.restate.sdktesting.tests.UserErrors ‑ failSeveralTimesWithMetadata(URI)
dev.restate.sdktesting.tests.UserErrors ‑ internalCallFailurePropagationWithMetadata(URI)
dev.restate.sdktesting.tests.UserErrors ‑ invokeTerminallyFailingCallWithMetadata(URI)
dev.restate.sdktesting.tests.UserErrors ‑ sideEffectWithTerminalErrorWithMetadata(URI)
dev.restate.sdktesting.tests.KafkaIngress ‑ handleEventInCounterService(URI, int, Client)
dev.restate.sdktesting.tests.KafkaIngress ‑ handleEventInEventHandler(URI, int, Client)
dev.restate.sdktesting.tests.UpgradeWithInFlightInvocation ‑ inFlightInvocation(Client, URI)
dev.restate.sdktesting.tests.UpgradeWithNewInvocation ‑ executesNewInvocationWithLatestServiceRevisions(Client, URI)

♻️ This comment has been updated with latest results.

@AhmedSoliman
Copy link
Copy Markdown
Contributor

just a tiny detail. it's worth noting that vqueues could be enabled even if the experimental feature flag is off, this may happen if we are downgrading from v1.8. (cc. @tillrohrmann)

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dd5861eb23

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

#[cfg_attr(feature = "schemars", schemars(skip))]
#[serde(skip_serializing_if = "std::ops::Not::not", default)]
pub experimental_allow_protocol_v7: bool,
protocol_v7,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve protocol-v7 flag name compatibility

Keep the protocol-v7 config field backward-compatible: declaring protocol_v7 here makes the macro generate experimental_enable_protocol_v7, but the previous public config key was experimental_allow_protocol_v7. Existing configs that still set the old key will no longer enable v7 after this change (unknown fields are ignored during deserialization), which can silently keep nodes on the old protocol behavior. Add an alias/compat path for the old name or keep this flag explicitly named.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The protocol_v7 is an unreleased feature so it's okay to rename the flag

@muhamadazmy
Copy link
Copy Markdown
Contributor Author

@AhmedSoliman

just a tiny detail. it's worth noting that vqueues could be enabled even if the experimental feature flag is off, this may happen if we are downgrading from v1.8. (cc. @tillrohrmann)

can you clarify this. Does this mean vqueues that exist already will continue to work but no new vqueues will be created ? or vqueues will be enabled by default in that case? and if yes, how does the system know it's enabled?

@muhamadazmy
Copy link
Copy Markdown
Contributor Author

Also forgot to mention that the serde rename attribute per field can only accept literal string (as #[serde(rename = "...")]) so using concat!() or stringify!() does not work. So i had to keep the fields names as experimental_enable_<field> and create getters/setter for the attributes.

@tillrohrmann
Copy link
Copy Markdown
Contributor

tillrohrmann commented May 5, 2026

can you clarify this. Does this mean vqueues that exist already will continue to work but no new vqueues will be created ? or vqueues will be enabled by default in that case? and if yes, how does the system know it's enabled?

While the details are not fully fleshed out, we expect that v1.8 will have to run some automatic migrations to migrate existing invocations to run on vqueues. If a user rolls back the system to v1.7 after the migration has run, then v1.7 will have to enable vqueues for the partitions that were migrated. Most likely, the partition store will have a flag that signals whether the it was migrated or not.

@tillrohrmann tillrohrmann linked an issue May 5, 2026 that may be closed by this pull request
@muhamadazmy
Copy link
Copy Markdown
Contributor Author

@tillrohrmann is the storage flag a release blocker for v1.7? or will it be introduced as a patch to v1.7 once migration process is planned for v1.8.

@tillrohrmann
Copy link
Copy Markdown
Contributor

I think this is something that we need to do for the v1.7 release.

Copy link
Copy Markdown
Contributor

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding a more structured variant to organize our experimental flags @muhamadazmy. The changes look good to me.

Once we have clarified how exactly we communicate that vqueues are enabled in v1.8 and thereby also how v1.7 needs to respect it, we probably need to update how exactly we generate the features map that we return at the /version endpoint.

I guess an alternative to exposing whether vqueues are enabled or not and letting the UI use different code paths could be that we add something like a translation layer for the sys_inbox table which reads from the vqueues table. That way it might be transparent to the UI whether vqueues are used or not in v1.7.

.ingress
.advertised_address(tc.address_book())
})),
features: Configuration::pinned().common.experimental.features(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect that this feature map won't solely be created based on the Configuration as we might be rolling back from v1.8 where we ran the auto migration to enable vqueues. Once the migration has ran, v1.7 needs to use them as there is no way going back.

Since the migration will most likely run on a per partition basis and one might even end up with a partially migrated cluster when rolling back, I think that we are going to use the NodesConfiguration to signal to all nodes whether vqueues should be used or not. Details to be figured out.

let mut map = std::collections::HashMap::default();
$(
paste!{
map.insert(std::borrow::Cow::Borrowed(stringify!($field)), self.[<experimental_enable_ $field>]);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the key of map be &'static str?

/// the scheduler instead of consuming retry attempts. Requires all nodes in the
/// cluster to be running v1.7.0 or later because it introduces a new WAL variant.
///
/// Since v1.6.3
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update to v1.7.0 as we never released v1.6.3.

/// where v7 is not supported < v1.7
#[cfg_attr(feature = "schemars", schemars(skip))]
#[serde(skip_serializing_if = "std::ops::Not::not", default)]
pub experimental_allow_protocol_v7: bool,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to update ci.yml to set the new env variable to enable the protocol v7.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make UI work with vqueues

3 participants