Skip to content

Align and standardise metadata in final CF documentation artifacts (PDF and HTML) #631

@cofinoa

Description

@cofinoa

Moderator

@cofinoa

Requirement Summary

The final CF documentation artifacts (PDF and HTML) should expose clear, consistent, and standards-friendly metadata suitable for indexing, citation, and long-term archival (e.g. Zenodo, libraries, search engines). At present, metadata is incomplete, partially inconsistent, and not fully aligned with current standards or best practices.

This does not affect the normative content of the CF conventions or the conformance documents.

Technical Proposal Summary

Introduce a minimal and consistent metadata model for the final CF artifacts (PDF and HTML), aligned with widely recognised standards (e.g. Dublin Core, modern HTML metadata, PDF 2.0 / ISO 32000), without changing any normative content of the CF specification.

Benefits

  • Better discoverability and indexing of CF documentation.
  • Cleaner and more accurate metadata for Zenodo and other archives.
  • Clear separation between publication metadata and build/tooling metadata.
  • Improved consistency between PDF and HTML artifacts.

Status Quo

Currently:

  • PDF Producer contains incorrect information. According to the PDF standard, Producer and Creator are intended to describe the software used to create and produce the PDF, not the document authors. At present, authors are incorrectly listed in Producer.
  • Creator reflects tooling, which is appropriate, but is not clearly distinguished from content authorship.
  • PDF metadata does not include Description, Keywords, or explicit CF version information.
  • HTML metadata is minimal, tool-generated, and not aligned with standard vocabularies.
  • Metadata is not consistently aligned between PDF, HTML, and conformance documents.
  • There is no explicit alignment with newer PDF metadata practices (PDF 2.0 / ISO 32000).

Associated pull request

None at present.

Detailed Proposal

This issue proposes to:

  1. Define a small, explicit set of metadata fields for CF artifacts:

    • Title
    • CF version
    • Publication date
    • Authors and affiliations
    • Description
    • Keywords
    • Persistent identifier (DOI)
    • Build timestamp (clearly distinct from publication date)
  2. For PDF artifacts:

    • Correct the semantics of Producer and Creator, ensuring they identify the software toolchain used to generate the PDF, as intended by the PDF standard.
    • Add missing descriptive metadata (e.g. Description, Keywords).
    • Move towards metadata structures compatible with PDF 2.0 / ISO 32000 where feasible.
  3. For HTML artifacts:

    • Improve and standardise <meta> entries.
    • Align HTML metadata with PDF metadata.
    • Use widely recognised, community-standard metadata conventions.
  4. Apply the same principles consistently to both the CF specification and the conformance document.

  5. Evaluate and implement these improvements using the existing Asciidoctor-based toolchain where possible. The tools already in use appear to support most of the required metadata without major plumbing changes, but this should be confirmed carefully to avoid unnecessary complexity or over-engineering.

All changes should be incremental, tooling-compatible, and non-normative.

Metadata

Metadata

Assignees

No one assigned

    Labels

    GitHubImprovement to how we use GitHub for this repositoryenhancementProposals to add new capabilities, improve existing ones in the conventions, improve style or format

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions