Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metadata Improvement]: create an html-clean-up helper #220

Open
2 of 17 tasks
gtsueng opened this issue Dec 5, 2024 · 0 comments
Open
2 of 17 tasks

[Metadata Improvement]: create an html-clean-up helper #220

gtsueng opened this issue Dec 5, 2024 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@gtsueng
Copy link
Contributor

gtsueng commented Dec 5, 2024

Issue Name

create an html-clean-up helper

Issue Description

Some description field values across various repositories have html-tagged/html-formatted text as the description. These become difficult to review/read as plain text when the record is exported to .csv (or if looking at the json file via the API).

To address these, create a helper function that can be used on a description field string that:

  1. checks for common html tags (e.g. <p>, <br>, <i>, etc.) in the string
  2. Uses beautifulsoup or some other means of replacing/cleaning out the html tags from the string

Once created, apply this helper function to clean up the description field and ensure that html tags are removed from this field and consequently the .csv dumps of this field.

Issue Discussion

2024.10.29

Please select the type of metadata improvement

  • Standardization (normalizing free text to an ontology)
  • Augmentation (adding values for metadata fields missing values)
  • Clean up (addressing redundancy or messy metadata)
  • Structure (changing the structuring of the metadata to support front end UI features)

Meta URL

No response

Related WBS task

https://github.com/NIAID-Data-Ecosystem/nde-roadmap/issues/2

For internal use only. Assignee, please select the status of this issue

  • Not yet started
  • In progress
  • Blocked
  • Will not address

Status Description

No response

Request status check list

  • This metadata improvement has yet to be discussed between NIAID, Scripps, Leidos
  • This metadata improvement does not need to be discussed between NIAID, Scripps, Leidos
  • This metadata improvement has been discussed/reported between NIAID, Scripps, Leidos
  • This metadata improvement has been implemented locally to generate data for review
  • This metadata improvement has been implemented on Dev
  • This metadata improvement has been implemented on Dev and the results have been reviewed and approved for staging
  • This metadata improvement has been implemented on Staging
  • This page/documentation/change has been approved for Production
  • This page/documentation/change has been implemented on Production
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants