You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
the main scalability issue of put_partial is that we copy all resources, this is not required for the new scheduler.
We could improve the scalability of this code path a lot by updating the underlying data structure
This requires changes to
database schema
put_partial and put_version
increment calculation?
scheduler (see draft below)
...?
Scheduler changes
The impact on the scheduler should be fairly minimal, as I tried to take this functionality into account when implementing the version diffing. Since I might be off when this design / implementation is picked up, and I know the scheduler internals best, I thought it best to include a draft implementation of the scheduler changes required to support partial exports.
The draft implementation below assumes that ModelVersion.from_db_records constructs a model version with either all resource sets (full version) or one or more resource sets (including the shared one), and all resources that live in those sets. If the design turns out to need something different, the implementation may need to change or simply be discarded. Whoever picks up this ticket is of course free to deviate from the draft in any way they see fit.
diff --git a/src/inmanta/deploy/scheduler.py b/src/inmanta/deploy/scheduler.py
index c51cc67bc..01e755ad4 100644
--- a/src/inmanta/deploy/scheduler.py+++ b/src/inmanta/deploy/scheduler.py@@ -134,8 +134,10 @@ class ModelVersion:
version: int
resources: Mapping[ResourceIdStr, ResourceIntent]
+ resource_sets: Mapping[Optional[str], Set[ResourceIdStr]] # must always contain at least the shared set
requires: Mapping[ResourceIdStr, Set[ResourceIdStr]]
undefined: Set[ResourceIdStr]
+ partial: bool
@classmethod
def from_db_records(cls: type[Self], version: int, resources: Collection[ResourceRecord]) -> Self:
@@ -149,6 +151,10 @@ class ModelVersion:
)
for resource in resources
},
+ resource_sets={+ # everything in shared set => TODO+ None: {resource["resource_id"] for resource in resources}+ },
requires={
ResourceIdStr(resource["resource_id"]): {
Id.parse_id(req).resource_str() for req in resource["attributes"].get("requires", [])
@@ -759,16 +765,37 @@ class ResourceScheduler(TaskManager):
version: int
intent: dict[ResourceIdStr, ResourceIntent] = {}
+ resource_sets: dict[Optional[str], Set[ResourceIdStr]] = {}
resource_requires: dict[ResourceIdStr, Set[ResourceIdStr]] = {}
# keep track of all new resource intent and all changes of intent relative to the currently managed version
intent_changes: dict[ResourceIdStr, ResourceIntentChange] = {}
# all undefined resources in the new model versions. Newer version information overrides older ones.
undefined: set[ResourceIdStr] = set()
+ # True iff the above represents a partial view on the intent, i.e. relative to the current state's intent.+ # False if it represents all intent+ partial: bool = True
for model in new_versions:
version = model.version
- # resources that don't exist anymore in this version- for resource in (self._state.intent.keys() | intent.keys()) - model.resources.keys():++ # Update resource sets and check for deleted resources+ known_resources: Set[ResourceIdStr]+ if model.partial:+ known_resources = (+ # get resources in set from tracked resource set, falling back to state's resource set if appropriate+ resource_sets.get(resource_set, self._state.resources_sets.get(resource_set, set()) if partial else set())+ for resource_set, resources in model.resource_sets.items()+ )+ # update resource sets mapping+ for resource_set, resources in model.resource_sets.items():+ resource_sets[resource_set] = resources+ else:+ known_resources = intent.keys() | (self._state.intent.keys() if partial else set())+ # from now on we track all resource sets+ partial = False+ resource_sets = dict(model.resource_sets)++ for resource in known_resources - model.resources.keys():
with contextlib.suppress(KeyError):
del intent[resource]
del resource_requires[resource]
@@ -830,8 +857,10 @@ class ResourceScheduler(TaskManager):
ModelVersion(
version=version,
resources=intent,
+ resource_sets=resource_sets,
requires=resource_requires,
undefined=undefined,
+ partial=partial,
),
intent_changes,
)
@@ -853,7 +882,8 @@ class ResourceScheduler(TaskManager):
:param up_to_date_resources: Set of resources that are to be considered in an assumed good state due to a previously
successful deploy for the same intent. Should not include any blocked resources. Mostly intended for the very first
- version when the scheduler is started with a fresh state.+ version when the scheduler is started with a fresh state. Must be present in the latest version in the+ new versions list.
:param last_deploy_time: last known deploy time of all resources. only intended for the very first
version when the scheduler is started with a fresh state.
It must contain ALL resources as it is used to start the timers
@@ -959,8 +989,13 @@ class ResourceScheduler(TaskManager):
# and has to be even apart from motivations 2-3, it may interleave with
# 2. clarity: it clearly signifies that this is the atomic and performance-sensitive part
async with self._scheduler_lock:
- # update model version+ # update model version and resource set mapping
self._state.version = model.version
+ if model.partial:+ for resource_set, resources in model.resource_sets.items():+ self._state.resource_sets[resource_set] = resources+ else:+ self._state.resource_sets = dict(model.resource_sets)
# update resource intent
for resource in up_to_date_resources:
# Registers resource and removes from the dirty set
diff --git a/src/inmanta/deploy/state.py b/src/inmanta/deploy/state.py
index 9c512f2e2..a1c1e4280 100644
--- a/src/inmanta/deploy/state.py+++ b/src/inmanta/deploy/state.py@@ -197,6 +197,7 @@ class ModelState:
version: int
intent: dict["ResourceIdStr", ResourceIntent] = dataclasses.field(default_factory=dict)
+ resource_sets: dict[Optional[str], "ResourceIdStr"] = dataclasses.field(default_factory=dict)
requires: RequiresProvidesMapping = dataclasses.field(default_factory=RequiresProvidesMapping)
resource_state: dict["ResourceIdStr", ResourceState] = dataclasses.field(default_factory=dict)
# resources with a known or assumed difference between intent and actual state
The text was updated successfully, but these errors were encountered:
the main scalability issue of put_partial is that we copy all resources, this is not required for the new scheduler.
We could improve the scalability of this code path a lot by updating the underlying data structure
This requires changes to
Scheduler changes
The impact on the scheduler should be fairly minimal, as I tried to take this functionality into account when implementing the version diffing. Since I might be off when this design / implementation is picked up, and I know the scheduler internals best, I thought it best to include a draft implementation of the scheduler changes required to support partial exports.
The draft implementation below assumes that
ModelVersion.from_db_records
constructs a model version with either all resource sets (full version) or one or more resource sets (including the shared one), and all resources that live in those sets. If the design turns out to need something different, the implementation may need to change or simply be discarded. Whoever picks up this ticket is of course free to deviate from the draft in any way they see fit.The text was updated successfully, but these errors were encountered: