Skip to content

Conversation

gzdunek
Copy link
Contributor

@gzdunek gzdunek commented Sep 26, 2025

Contributes to #25806

Part 2/2 of moving cluster state to the main process.

Centralizing the cluster state in the main process makes it easier to read it and update in both the main and the renderer processes.

Originally, I had planned to refactor the overall shape of the state as well, but that turned out to be a really large change. So for now, I've only moved the necessary parts of the state to the main process, and aimed to keep everything working as before.

The three main behaviors I wanted to maintain are:

  1. State is updated before the handlers resolve
    This is achieved using AwaitableSender, which ensures that the renderer acknowledges each update message before the handler completes. This maintains the previous assumption that the state is immediately available after an update.
  2. useStateSelector can still work effectively
    In the previous implementation, calling setState only updated the parts of the state that actually changed, allowing useStateSelector to avoid unnecessary re-renders.
    If we were to send the full state over IPC, it would be serialized with structuredClone, resulting in a new object each time, which would break referential stability and make useStateSelector useless.
    To avoid this, we now generate patches in the main process (using immer) and apply them in the renderer. This preserves object references where possible and keeps useStateSelector working effectively. This feature is probably not super important, I'm treating it as a nice to have, as it was easy to achieve.
  3. Errors are still handled in the notifications. This is done by invoking all the functions from the renderer, so we get the errors back. I'd still want to change that and something like error field to the cluster state, I'm going to refactor the state shape in the future.

@gzdunek gzdunek requested review from avatus and ravicious September 26, 2025 12:27
@gzdunek gzdunek added the no-changelog Indicates that a PR does not require a changelog entry label Sep 26, 2025
@gzdunek gzdunek force-pushed the gzdunek/cluster-store branch from a61b146 to 3b0f6fc Compare September 26, 2025 12:34
@ravicious ravicious self-requested a review October 7, 2025 16:27
@gzdunek gzdunek force-pushed the gzdunek/awaitable-sender branch from c9a9402 to 87fda96 Compare October 13, 2025 12:32
@gzdunek gzdunek force-pushed the gzdunek/cluster-store branch from 3b0f6fc to 3e4c985 Compare October 13, 2025 12:33
@gzdunek gzdunek force-pushed the gzdunek/awaitable-sender branch from 87fda96 to 032b4ac Compare October 13, 2025 13:36
@gzdunek gzdunek force-pushed the gzdunek/cluster-store branch from 3e4c985 to 67d5a63 Compare October 13, 2025 13:37
@gzdunek gzdunek force-pushed the gzdunek/cluster-store branch from 67d5a63 to 6669dd1 Compare October 15, 2025 12:00
@gzdunek gzdunek requested a review from ravicious October 15, 2025 12:07
Copy link
Member

@ravicious ravicious left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found a bug related to the proxy host allow list.

// The workaround is to update the field in case of a failure,
// so the places that wait for showResources !== UNSPECIFIED don't get stuck indefinitely.
cluster.showResources = ShowResources.ACCESSIBLE_ONLY;
private subscribeToClusterStore(): void {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it makes sense to add tests which check if subscribeToClusterStore indeed preserves identity? I was thinking that it is something that can easily slip past us, OTOH… Is it even possible for it to not preserve identity if it's backed by Immer? This also got me thinking and I came to the conclusion that this statement:

If we were to send the full state over IPC, it would be serialized with structuredClone, resulting in a new object each time, which would break referential stability and make useStateSelector useless.

is not necessarily correct. If we were sending the full state, then Immer would still take care to change only those parts of the state that actually need to change. It could just end up being super expensive. I remember Bartosz had some performance problems initially when he switched to Immer in the role editor.

With that said I suppose I answered my one question: we don't need those identity tests because it's impossible for subscribeToClusterStore to not preserve object identity.

Funny that I wrote this comment:

// It doesn't appear to be explicitly documented anywhere, but Immer preserves object
// identity, so Object.is works as expected. This behavior is covered by our tests.
const hasSelectedStateChanged = !Object.is(
newSelectedState,
selectedState
);

This feature is probably not super important, I'm treating it as a nice to have, as it was easy to achieve.

What do you mean it's not super important? You mean keeping useStateSelector working? I feel like it's super important because otherwise many places in the app would start re-rendering way more often than necessary! 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were sending the full state, then Immer would still take care to change only those parts of the state that actually need to change. It could just end up being super expensive. I remember Bartosz had some performance problems initially when he switched to Immer in the role editor.

I'm not sure if I understand.
If were sending a full state, it would be applied in the renderer in the following way:

      this.setState(c => {
          c.clusters = castDraft(e.value);
      })

Now, if we send an update with the full state, but with one cluster having flipped the connected flag, would Immer really be that smart to only modify that one flag? I was under impression that it would just always replace c.clusters with a new value.

In the docs there's a following example:

        case "adduser-3":
            // OK: returning a new state. But, unnecessary complex and expensive
            return {
                userCount: draft.userCount + 1,
                users: [...draft.users, action.payload]
            }

But here we merge the state manually.

With that said, I think we probably don't need a test for that, as long as we produce patches on the one side and consume it on the another. Immer will take of preserving the identity.

What do you mean it's not super important? You mean keeping useStateSelector working? I feel like it's super important because otherwise many places in the app would start re-rendering way more often than necessary! 😅

Ah, I was thinking we still use ClustersService.useState quite a lot, but actually it’s not that bad - there are only a few instances of it, and they’re not even that high up in the component tree. So indeed, it's important to keep useStateSelector working!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, if we send an update with the full state, but with one cluster having flipped the connected flag, would Immer really be that smart to only modify that one flag? I was under impression that it would just always replace c.clusters with a new value.

It turns out I was wrong, it'd indeed replace it with a new value.

I added this patch:

Patch
diff --git a/web/packages/teleterm/src/mainProcess/clusterStore/clusterStore.ts b/web/packages/teleterm/src/mainProcess/clusterStore/clusterStore.ts
index 985fd0658be..c9c74db872f 100644
--- a/web/packages/teleterm/src/mainProcess/clusterStore/clusterStore.ts
+++ b/web/packages/teleterm/src/mainProcess/clusterStore/clusterStore.ts
@@ -172,8 +172,8 @@ export class ClusterStore {
       this.senders.values().map(sender => {
         const send = this.withErrorHandling(update => sender.send(update));
         return send({
-          kind: 'patches',
-          value: patches,
+          kind: 'state',
+          value: this.state,
         });
       })
     );
diff --git a/web/packages/teleterm/src/ui/StatusBar/StatusBar.tsx b/web/packages/teleterm/src/ui/StatusBar/StatusBar.tsx
index a1d8576b4b7..5b6e0494f52 100644
--- a/web/packages/teleterm/src/ui/StatusBar/StatusBar.tsx
+++ b/web/packages/teleterm/src/ui/StatusBar/StatusBar.tsx
@@ -44,6 +44,8 @@ export function StatusBar(props: { onAssumedRolesClick(): void }) {
   const assumed = useAssumedRequests(rootClusterUri);
   const assumedRolesText = getAssumedRoles(assumed);
 
+  console.log('%c Rendering StatusBar', 'background: #222; color: #bada55');
+
   return (
     <Flex
       width="100%"

You can see that the status bar does not re-render when I close a gateway (thus updating only the gateways part of the clusters service). But when I log out of a cluster, the states bar does get re-rendered.

Rendering with full state updates
rendering.with.full.state.mov

This does not happen when only patches are sent through:

Rendering with patches
rendering.with.patches.mov

allowList.clear();
for (const rootCluster of rootClusters) {
if (rootCluster.proxyHost) {
export function makeRootClusterProxyHostAllowList(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something is wrong with the new implementation, because when I try to launch an app from Connect, I get an uncaught error:

TypeError: Cannot read properties of undefined (reading 'clusterStore')
at isUrlSafe

It'd be nice to have an integration test for to make sure that the allow list works. Though off the top of my head I'm not sure how it'd work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, by a mistake I tried to read .clusterStore from this, instead of mainProcess.

It'd be nice to have an integration test for to make sure that the allow list works.

I added a semi-integration test by extracting a handler registered in app.on('web-contents-created') so I could call it in tests. Let me know what you think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still get an error:

TypeError: Cannot read properties of undefined (reading 'getRootClusters')
at isUrlSafe (/Users/rav/src/teleport/web/packages/teleterm/build/app/main/index.js:54706:18)
at WebContents._windowOpenHandler (/Users/rav/src/teleport/web/packages/teleterm/build/app/main/index.js:54681:9)
at b._callWindowOpenHandler (node:electron/js2c/browser_init:2:85211)
at WebContents.<anonymous> (node:electron/js2c/browser_init:2:90531)
at WebContents.emit (node:events:518:28)

Is it because mainProcess.clusterStore gets created asynchronously? How are we going to make sure clusterStore is available before the first IPC call from the renderer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delaying ClusterStore initialization until getTshdClients resolves was asking for trouble 🙊
Sorry for not testing this.

@gzdunek gzdunek requested a review from ravicious October 17, 2025 13:37
);
});
this.clusterStore = new ClusterStore(
this.getTshdClients().then(c => c.terminalService),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The coordination between the processes is getting a little hectic and we don't have it well documented. If I were to look at MainProcess and ClusterStore with no knowledge about them, my questions would be:

  1. What if getTshdClients returns an error? How is the error surfaced to the user?
  2. If the renderer process directly depends on ClusterStore and ClusterStore depends on the result of a promise, what happens if that promise hangs indefinitely? Can the renderer reasonably assume that ClusterStore is ready when the renderer wants to talk to it?

The answer to both questions is concealed in the fact that both getTshdClients and the startup of the frontend app depend on the result of the same promise. The error from said promise is surfaced primarily in the UI of the renderer.

Could you add docs for getTshdClients and resolvedChildProcessAddresses that would provide some context behind this?

getTshdClients can also be made private now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no-changelog Indicates that a PR does not require a changelog entry size/md ui

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants