Trailpack proposes a standard way to link data and specialized metadata in one single file. It provides a simple interface to link metadata to fixed ontologies, improving the accessibility and comparability of datasets from different sources.
Trailpack combines metadata and data into a single Parquet file, making open data more accessible and sustainable. It validates metadata against developed standards including:
- General metadata for the data package (name, license, contributors)
- Specialized metadata for each data column - linking both column names and units to fixed descriptions in ontologies provided by PyST
The developed standard expands on and is compatible with the Frictionless Data Package specification. The metadata is included under the datapackage.json keyword in the Parquet file.
The output file is readable using PyArrow and other data handlers - and will be compatible and consumable using Sentier data tools.
Origin: Trailpack was initially built during the hackathon of Brightcon 2025 in Grenoble, as part of developing the standard data format for DΓ©part de Sentier.
You can install trailpack via pip from PyPI:
pip install trailpackThe easiest way to use Trailpack is through the web application.
The web app provides a step-by-step workflow:
- Upload File & Select Language: Upload an Excel file and select language for PyST mapping
- Select Sheet: Choose which sheet to process with data preview
- Map Columns: Map each column to PyST concepts with automatic suggestions
- General Details: Provide package metadata (name, title, license, contributors)
- Download: Get your standardized Parquet file with embedded metadata
For walkthrough videos demonstrating the workflow, see the documentation.
You can also run the Streamlit UI locally:
trailpack uiFor more details, see trailpack/ui/README.md.
Deploying to Streamlit Cloud? See STREAMLIT_DEPLOYMENT.md for complete deployment instructions.
Trailpack includes comprehensive schema classes for building Frictionless Data Package metadata:
DataPackageSchema: Defines field types, validation rules, and UI configurationMetaDataBuilder: Fluent interface for creating metadata programmatically- Field validation: Built-in validation for package names, versions, URLs
- UI integration ready: Field definitions include labels, placeholders, patterns
- Standards compliant: Follows Frictionless Data Package specification
from trailpack.packing.datapackage_schema import MetaDataBuilder, Resource
# Create metadata with fluent interface
metadata = (MetaDataBuilder()
.set_basic_info(name="my-dataset", title="My Dataset")
.add_license("CC-BY-4.0")
.add_contributor("Your Name", "author")
.add_resource(Resource(name="data", path="data.parquet"))
.build())
# Use with Packing class
from trailpack.packing import Packing
packer = Packing(df, metadata)
packer.write_parquet("output.parquet")Trailpack includes a comprehensive validation system to ensure data quality and standards compliance:
- β Metadata validation: Required fields, naming conventions, license checking
- β Data quality metrics: Missing values and duplicates (logged as info, not errors)
- β Type consistency: Mixed types and schema matching (raises errors)
- β Unit requirements: All numeric fields must have units (including dimensionless)
- β Compliance levels: STRICT, STANDARD, BASIC, or NON-COMPLIANT
from trailpack.validation import StandardValidator
# Create validator
validator = StandardValidator("1.0.0")
# Validate everything
result = validator.validate_all(
metadata=metadata_dict,
df=dataframe,
schema=schema_dict
)
# Check results
if result.is_valid:
print(f"{result.level}") # e.g., "β
STRICT COMPLIANCE"
else:
print(result) # Shows all errors and warningsAll numeric fields must specify units, even for dimensionless quantities:
- Measurements: Use SI or domain units (kg, m, Β°C)
- IDs/Counts: Use dimensionless unit (
http://qudt.org/vocab/unit/NUM) - Percentages: Use percent or dimensionless
See trailpack/validation/README.md for complete documentation.
Contributions are very welcome! To learn more, see the Contributor Guide.
Install the package with development requirements:
pip install -e ".[dev]"Run tests:
pytestFor more information, see CONTRIBUTING.md.
Distributed under the terms of the MIT license, trailpack is free and open source software.
If you encounter any problems, please file an issue along with a detailed description.
You can build the documentation locally by installing the documentation Conda environment:
conda env create -f docs/environment.ymlactivating the environment
conda activate sphinx_trailpackand running the build command:
sphinx-build docs _build/html --builder=html --jobs=auto --write-all; open _build/html/index.html