Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature/#397 scenario duplication #2373

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

toan-quach
Copy link
Member

What type of PR is this? (check all applicable)

  • Refactor
  • Feature
  • Bug Fix
  • Optimization
  • Documentation Update

Description

Related Tickets & Documents

#397

@toan-quach toan-quach marked this pull request as draft December 27, 2024 07:17
@toan-quach toan-quach force-pushed the feature/#397-duplicate-scenarios branch from 658f02f to 2644479 Compare December 27, 2024 07:17
Copy link
Contributor

github-actions bot commented Dec 27, 2024

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
19683 17148 87% 0% 🟢

New Files

No new covered files...

Modified Files

File Coverage Status
taipy/core/data/_data_manager.py 98% 🟢
taipy/core/data/_file_datanode_mixin.py 98% 🟢
taipy/core/data/csv.py 98% 🟢
taipy/core/data/data_node.py 97% 🟢
taipy/core/data/excel.py 84% 🟢
taipy/core/data/json.py 97% 🟢
taipy/core/data/parquet.py 97% 🟢
taipy/core/data/pickle.py 100% 🟢
taipy/core/scenario/_scenario_manager.py 97% 🟢
taipy/core/task/_task_manager.py 97% 🟢
taipy/core/task/task.py 100% 🟢
TOTAL 97% 🟢

updated for commit: 190f6a6 by action🐍

@jrobinAV jrobinAV added Core Related to Taipy Core Core: Data node Core: 🎬 Scenario & Cycle 🟨 Priority: Medium Not blocking but should be addressed labels Jan 6, 2025
@toan-quach toan-quach force-pushed the feature/#397-duplicate-scenarios branch from 2644479 to f79d591 Compare January 13, 2025 07:26
@toan-quach toan-quach marked this pull request as ready for review January 15, 2025 13:10
Comment on lines 537 to 538
cloned_scenario_id = cloned_scenario._new_id(cloned_scenario.config_id)
cloned_scenario.id = cloned_scenario_id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cloned_scenario_id = cloned_scenario._new_id(cloned_scenario.config_id)
cloned_scenario.id = cloned_scenario_id
cloned_scenario.id = cloned_scenario._new_id(cloned_scenario.config_id)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should implement a can_duplicate method, returning reasons, just like we have a can_create method.

Comment on lines 187 to 195
cloned_dn = cls._get(dn)

cloned_dn.id = cloned_dn._new_id(cloned_dn._config_id)
cloned_dn._owner_id = cls._get_owner_id(cloned_dn._scope, cycle_id, scenario_id)
cloned_dn._parent_ids = set()

cls._set(cloned_dn)

cloned_dn._clone_data()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the data node has a cycle scope, and if the new scenario is from the same cycle, then we want to share the same data node between scenarios.
If the scope is global it always already exists, and we also want to share the existing one.

Comment on lines +227 to +235
if os.path.exists(self.path):
folder_path, base_name = os.path.split(self.path)
new_base_path = os.path.join(folder_path, f"TAIPY_CLONE_{id}_{base_name}")
if os.path.isdir(self.path):
shutil.copytree(self.path, new_base_path)
else:
shutil.copy(self.path, new_base_path)
return new_base_path
return ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to differentiate the cases where the initial path is generated by Taipy or provided by the user?
I believe it would be better. If it is Taipy generated, we can just replace the old id by the new one. otherwise, adding a prefix or a suffix as you did make sense.


cloned_additional_data_nodes = set()
for data_node in cloned_scenario.additional_data_nodes.values():
cloned_additional_data_nodes.add(_data_manager._clone(data_node, None, cloned_scenario_id))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicating data nodes should depend on its scope.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should implement a can_duplicate method, returning reasons, just like we have a can_create method.

@@ -521,3 +521,47 @@ def _get_by_config_id(cls, config_id: str, version_number: Optional[str] = None)
for fil in filters:
fil.update({"config_id": config_id})
return cls._repository._load_all(filters)

@classmethod
def _clone(cls, scenario: Scenario) -> Scenario:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should accept an optional name and an optional date in the method signature. It does not change much for the name but it does for the creation date. This has an impact on the potential cycle as the new scenario might be on a different cycle.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm? I can understand the name, but the creation date?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's say I have a January scenario I am happy with. I want to start with a duplicate of this one to compute my February scenario. So beginning of February I duplicate my January scenario passing the current date so the new scenario is in the February cycle.

Does it make sense for you? And @FlorianJacta any opinion on that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree; duplication for me was not about the creation date. Your use case is a real use case from CFM

@@ -521,3 +521,47 @@ def _get_by_config_id(cls, config_id: str, version_number: Optional[str] = None)
for fil in filters:
fil.update({"config_id": config_id})
return cls._repository._load_all(filters)

@classmethod
def _clone(cls, scenario: Scenario) -> Scenario:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the issue, we should also pass an optional list of data nodes or data node IDs. Without any list; we should copy all the data. If the list is provided, only the files of the data nodes in the list should be copied.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave it as the next step for now

@@ -226,3 +231,22 @@ def _get_by_config_id(cls, config_id: str, version_number: Optional[str] = None)
for fil in filters:
fil.update({"config_id": config_id})
return cls._repository._load_all(filters)

@classmethod
def _clone(cls, task: Task, cycle_id: Optional[CycleId] = None, scenario_id: Optional[ScenarioId] = None) -> Task:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename _clone method to _duplicate as it does not return another instance of the same object. It returns a similar object with a few differences (ids, sub-entities' ids, paths, etc...)

Comment on lines 196 to 199
def _clone_data(self):
new_data_path = self._clone_data_file(self.id)
self._properties[self._PATH_KEY] = new_data_path
return new_data_path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move that outside the data node, to put it in the data_manager.
Just like we handle the parent_ids and the owner_id in the manager, I would set the path property in the manager as well (still retrieving the value from a fileDatanodeMixing method).
This is debatable, though...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well this is specific for file DNs only, if we put it in the data manager, we're grouping it with other types of DNs like Sql or mongo, I don't think it's a good idea

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Core: Data node Core: 🎬 Scenario & Cycle Core Related to Taipy Core 🟨 Priority: Medium Not blocking but should be addressed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants