Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/276 harvesting metadata from a provided repository url #278

Draft
wants to merge 10 commits into
base: develop
Choose a base branch
from

Conversation

Aidajafarbigloo
Copy link

Added the URL to the hermes harvest command. Now, the command hermes harvest harvest the metadata from the local repository, and hermes harvest --url <URL> allows harvesting metadata from the provided URL, with support for GitHub and GitLab repositories.

(e.g., hermes harvest --url https://github.com/NFDI4Energy/SMECS)

@Aidajafarbigloo Aidajafarbigloo marked this pull request as draft October 29, 2024 08:49
@Aidajafarbigloo
Copy link
Author

@sferenz
Could you please take a look at this pull request and share your feedback?

@Aidajafarbigloo
Copy link
Author

Harvesting metadata from the provided URL (GitHub/GitLab). Command: hermes harvest --path <URL>

@Aidajafarbigloo
Copy link
Author

@sferenz
Could you please take a look at this pull request and share your feedback?

Copy link
Member

@sferenz sferenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the nice code! Please have a look at the comments :)

@@ -131,7 +131,7 @@ def init_command_parser(self, command_parser: argparse.ArgumentParser) -> None:
def load_settings(self, args: argparse.Namespace):
"""Load settings from the configuration file (passed in from command line)."""

toml_data = toml.load(args.path / args.config)
toml_data = toml.load("." / args.config)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this still work if a regular path is given to HERMES?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, specifying a directory path containing CFF or CodeMeta files is also acceptable. For example, the following command works:

hermes harvest --path C:\path\to\your\directory

@@ -25,7 +25,7 @@ def __call__(self, command: HermesCommand) -> t.Tuple[t.Dict, t.Dict]:
pass


class HarvestSettings(BaseModel):
class _HarvestSettings(BaseModel):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you rename this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn’t intend to change the class name. There was an issue with incorrectly pulling the original code for base.py from an updated version of it. This occurred due to a recent update in the settings classes, where all were made private in the develop branch of HERMES (commit a6c1a5e).

return None


def _download_to_tempfile(url: str, filename: str) -> pathlib.Path:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you delete the tempfiles later?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current code the temp files for CFF and CodeMeta are stored separately in C:\Temp on the local machine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will these files be deleted after the extraction process?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The files won't be deleted after harvesting, however, I can modify the code to delete temp files after extraction. Do you agree with this change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think the temp files should be deleted at the end of the process.

@Aidajafarbigloo
Copy link
Author

Thanks for the nice code! Please have a look at the comments :)

@sferenz Thank you for the comments.

Add functionality to remove temp files generated during remote harvesting.
Remove temp files after harvesting CFF metadata
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants