Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Solution for Purl Type Definitions #310

Open
stevespringett opened this issue Jun 24, 2024 · 5 comments
Open

Proposal: Solution for Purl Type Definitions #310

stevespringett opened this issue Jun 24, 2024 · 5 comments
Assignees
Labels
Ecma specification Work on the core specification PURL type definition Non-core definitions that describe and standardize PURL types PURL validation

Comments

@stevespringett
Copy link
Member

Proposal: Solution for Purl Type Definitions

Defining purl types in text within a single document has led to significant inconsistencies across the board. This approach introduces ambiguity, as different interpretations of the textual descriptions can arise, leading to varied implementations and integrations. The lack of a standardized, machine-readable format means that errors and discrepancies are more likely when different parties try to use purl types. Additionally, as the number of purl types grows, maintaining a single document becomes increasingly unwieldy, making it harder to ensure consistency and accuracy. These issues highlight the need for a more structured and formalized method of defining purl types, such as using a schema or other machine-readable format, to promote uniformity and reliability across implementations.

To address the inconsistencies arising from defining purl types in text within a single document, we propose a robust solution that transitions to a machine-readable format using JSON Schema. This approach offers several benefits, including improved consistency, maintainability, and scalability.

Solution Components

1. JSON Schema for Purl Types

By creating a JSON Schema and defining purl types in that schema, we establish a standardized, machine-readable format that eliminates ambiguity and ensures consistency across implementations. JSON Schema defines the structure and constraints for each purl type, providing clear and precise guidelines that can be uniformly interpreted by all implementations.

2. Dynamic Purl Type Support

Purl implementations could theoretically read in these JSON-based purl type definitions dynamically. This means that any current or future purl type can be supported without the need to hard code rules for each type in the code. This dynamic approach reduces maintenance overhead and allows for seamless integration of new purl types as they are defined.

3. Separation of Purl Type Definitions

Instead of maintaining all purl types in a single document, we propose separating out the purl type definitions into individual JSON files. This change eliminates the need for the current PURL-TYPES.rst file, thus removing a significant source of inconsistency and making the management of purl types more modular and scalable.

4. Directory Structure for Purl Types

Each purl type will have its own directory, containing:

  • Purl Type Definition (JSON): The JSON file defining the purl type.
  • Auto-generated Human-readable Definition: A human-readable version of the purl type definition, auto-generated from the JSON, ensuring that the human-readable documentation is always in sync with the machine-readable definitions.
  • Purl Test Cases: Specific test cases for the purl type, ensuring that implementations can validate their handling of each type against standardized test data.

5. Aggregation into a Single Distributable

While maintaining modularity with individual purl type definitions, there is a need to aggregate these definitions into a single distributable package. This approach ensures that all purl type definitions can be obtained from one source, simplifying the process for implementers. By leveraging the modularity of individual definitions, this distributable can be easily updated to include new or revised purl types without requiring prior knowledge of specific types. This ensures that users and systems always have access to the most current and comprehensive set of purl type definitions, facilitating seamless integration and support across various applications.

6. Streamlining Ecma TC54-TG2 Review Process

This approach will also significantly benefit Ecma TC54-TG2, the task group responsible for standardizing purl and purl types. By adopting JSON Schema and modular definitions, TC54-TG2 can streamline its review process, making it easier to evaluate and approve new purl types.

Conclusion

This structured approach ensures clarity, consistency, and ease of maintenance, facilitating the reliable implementation and extension of purl types. By transitioning to JSON Schema, supporting dynamic type inclusion, separating definitions, organizing them into directories and providing an aggregated distributable, we can significantly enhance the robustness and scalability of purl type management.

@stevespringett stevespringett added the PURL type definition Non-core definitions that describe and standardize PURL types label Jun 24, 2024
@stevespringett stevespringett self-assigned this Jun 24, 2024
@matt-phylum
Copy link
Contributor

Related: #38

Is this talking about a JSON schema (a schema that is JSON) or JSON Schema (a specific meta schema for specifying JSON objects)?

How will the schema express normalization rules, especially cases like pypi which replaces all runs of _.- characters with single - characters¹?

I wonder if there are two problems here. Having a machine readable schema seems like it could be used to validate that a PURL for a package type is in the expected canonical form for implementations that have access to things like compatible regular expression engines. However, I think most of the time users only care about parsing and unparsing (does not use type-specific knowledge) or canonicalizing (requires limited type-specific knowledge). It's probably sufficient to have a limited number of full validator implementations.

¹ #165 #262

@stevespringett
Copy link
Member Author

Is this talking about a JSON schema (a schema that is JSON) or JSON Schema (a specific meta schema for specifying JSON objects)?

I want to develop a JSON Schema that defines how to define a purl.

FYI, I'm currently doing developing a PoC locally and will check in a branch that has the start of the schema along with a few examples. I'll be sure to include Python in the examples. But I am accounting for normalization so we should be able to describe different types of rules for substitution, encoding, etc. I'd like to have something thats super lightweight and easy to work with. Nothing too complex.

@pombredanne
Copy link
Member

@stevespringett I was mistakenly thinking that you are suggesting to use JSON schema to define the PURL syntax and the specific validation for each types!

I now understand that you want instead to create a small JSON schema for a new JSON format that will document and specify in a structured way each PURL type. This makes a lot of sense.

I am comfy that we can define the format for this!

And beyond this, would there be a way to generate a library in multiple programming languages that would use some JSON definitions? This would be awesome, but this is likely a pipe dream.

@stevespringett
Copy link
Member Author

would there be a way to generate a library in multiple programming languages that would use some JSON definitions?

That is exactly the ideal state that I want to get to @pombredanne. It would dramatically reduce the if-then-else logic in all the implementations to account for the differences in the various purl types.

@johnmhoran johnmhoran added the Ecma specification Work on the core specification label Nov 5, 2024
@JimFuller-RedHat
Copy link

good idea - I would say pick a type (or a few) and raise a PR ;) ... I have opinions on the fidelity of json schema having enough power to resolve matters of schema ... but that is only because I am a database snob ... I am sure it is fine (and people are always free to use some other schema tech that can be converted (mostly) to jsonschema.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ecma specification Work on the core specification PURL type definition Non-core definitions that describe and standardize PURL types PURL validation
Projects
None yet
Development

No branches or pull requests

5 participants