Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve put_partial #8730

Open
sanderr opened this issue Feb 11, 2025 · 0 comments
Open

improve put_partial #8730

sanderr opened this issue Feb 11, 2025 · 0 comments
Labels
design Issues with a design aspect

Comments

@sanderr
Copy link
Contributor

sanderr commented Feb 11, 2025

the main scalability issue of put_partial is that we copy all resources, this is not required for the new scheduler.
We could improve the scalability of this code path a lot by updating the underlying data structure

This requires changes to

  • database schema
  • put_partial and put_version
  • increment calculation?
  • scheduler (see draft below)
  • ...?

Scheduler changes

The impact on the scheduler should be fairly minimal, as I tried to take this functionality into account when implementing the version diffing. Since I might be off when this design / implementation is picked up, and I know the scheduler internals best, I thought it best to include a draft implementation of the scheduler changes required to support partial exports.

The draft implementation below assumes that ModelVersion.from_db_records constructs a model version with either all resource sets (full version) or one or more resource sets (including the shared one), and all resources that live in those sets. If the design turns out to need something different, the implementation may need to change or simply be discarded. Whoever picks up this ticket is of course free to deviate from the draft in any way they see fit.

diff --git a/src/inmanta/deploy/scheduler.py b/src/inmanta/deploy/scheduler.py
index c51cc67bc..01e755ad4 100644
--- a/src/inmanta/deploy/scheduler.py
+++ b/src/inmanta/deploy/scheduler.py
@@ -134,8 +134,10 @@ class ModelVersion:

     version: int
     resources: Mapping[ResourceIdStr, ResourceIntent]
+    resource_sets: Mapping[Optional[str], Set[ResourceIdStr]]  # must always contain at least the shared set
     requires: Mapping[ResourceIdStr, Set[ResourceIdStr]]
     undefined: Set[ResourceIdStr]
+    partial: bool

     @classmethod
     def from_db_records(cls: type[Self], version: int, resources: Collection[ResourceRecord]) -> Self:
@@ -149,6 +151,10 @@ class ModelVersion:
                 )
                 for resource in resources
             },
+            resource_sets={
+                # everything in shared set => TODO
+                None: {resource["resource_id"] for resource in resources}
+            },
             requires={
                 ResourceIdStr(resource["resource_id"]): {
                     Id.parse_id(req).resource_str() for req in resource["attributes"].get("requires", [])
@@ -759,16 +765,37 @@ class ResourceScheduler(TaskManager):

         version: int
         intent: dict[ResourceIdStr, ResourceIntent] = {}
+        resource_sets: dict[Optional[str], Set[ResourceIdStr]] = {}
         resource_requires: dict[ResourceIdStr, Set[ResourceIdStr]] = {}
         # keep track of all new resource intent and all changes of intent relative to the currently managed version
         intent_changes: dict[ResourceIdStr, ResourceIntentChange] = {}
         # all undefined resources in the new model versions. Newer version information overrides older ones.
         undefined: set[ResourceIdStr] = set()
+        # True iff the above represents a partial view on the intent, i.e. relative to the current state's intent.
+        # False if it represents all intent
+        partial: bool = True

         for model in new_versions:
             version = model.version
-            # resources that don't exist anymore in this version
-            for resource in (self._state.intent.keys() | intent.keys()) - model.resources.keys():
+
+            # Update resource sets and check for deleted resources
+            known_resources: Set[ResourceIdStr]
+            if model.partial:
+                known_resources = (
+                    # get resources in set from tracked resource set, falling back to state's resource set if appropriate
+                    resource_sets.get(resource_set, self._state.resources_sets.get(resource_set, set()) if partial else set())
+                    for resource_set, resources in model.resource_sets.items()
+                )
+                # update resource sets mapping
+                for resource_set, resources in model.resource_sets.items():
+                    resource_sets[resource_set] = resources
+            else:
+                known_resources = intent.keys() | (self._state.intent.keys() if partial else set())
+                # from now on we track all resource sets
+                partial = False
+                resource_sets = dict(model.resource_sets)
+
+            for resource in known_resources - model.resources.keys():
                 with contextlib.suppress(KeyError):
                     del intent[resource]
                     del resource_requires[resource]
@@ -830,8 +857,10 @@ class ResourceScheduler(TaskManager):
             ModelVersion(
                 version=version,
                 resources=intent,
+                resource_sets=resource_sets,
                 requires=resource_requires,
                 undefined=undefined,
+                partial=partial,
             ),
             intent_changes,
         )
@@ -853,7 +882,8 @@ class ResourceScheduler(TaskManager):

         :param up_to_date_resources: Set of resources that are to be considered in an assumed good state due to a previously
             successful deploy for the same intent. Should not include any blocked resources. Mostly intended for the very first
-            version when the scheduler is started with a fresh state.
+            version when the scheduler is started with a fresh state. Must be present in the latest version in the
+            new versions list.
         :param last_deploy_time: last known deploy time of all resources. only intended for the very first
             version when the scheduler is started with a fresh state.
             It must contain ALL resources as it is used to start the timers
@@ -959,8 +989,13 @@ class ResourceScheduler(TaskManager):
         #   and has to be even apart from motivations 2-3, it may interleave with
         # 2. clarity: it clearly signifies that this is the atomic and performance-sensitive part
         async with self._scheduler_lock:
-            # update model version
+            # update model version and resource set mapping
             self._state.version = model.version
+            if model.partial:
+                for resource_set, resources in model.resource_sets.items():
+                    self._state.resource_sets[resource_set] = resources
+            else:
+                self._state.resource_sets = dict(model.resource_sets)
             # update resource intent
             for resource in up_to_date_resources:
                 # Registers resource and removes from the dirty set
diff --git a/src/inmanta/deploy/state.py b/src/inmanta/deploy/state.py
index 9c512f2e2..a1c1e4280 100644
--- a/src/inmanta/deploy/state.py
+++ b/src/inmanta/deploy/state.py
@@ -197,6 +197,7 @@ class ModelState:

     version: int
     intent: dict["ResourceIdStr", ResourceIntent] = dataclasses.field(default_factory=dict)
+    resource_sets: dict[Optional[str], "ResourceIdStr"] = dataclasses.field(default_factory=dict)
     requires: RequiresProvidesMapping = dataclasses.field(default_factory=RequiresProvidesMapping)
     resource_state: dict["ResourceIdStr", ResourceState] = dataclasses.field(default_factory=dict)
     # resources with a known or assumed difference between intent and actual state
@sanderr sanderr added the design Issues with a design aspect label Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Issues with a design aspect
Projects
None yet
Development

No branches or pull requests

1 participant