Skip to content

Data Validation

Julien A. Raemy edited this page Aug 11, 2025 · 1 revision

📢 Important Update: LINDAS is undergoing a major infrastructure migration to LINDASnext. The current Stardog-based system is being replaced with GraphDB EE. New contracts have been awarded in two lots for the period 2025-2034: Lot 1 (Infrastructure) to Cognizone and Lot 2 (Application Development) to Liip, Zazuko, and Adnovum with their respective partners. Services remain operational during transition. See LINDASnext for details.

LINDAS Data Validation

Data validation in LINDAS is based on the Shapes Constraint Language (SHACL), ensuring RDF data quality and compliance with platform standards.

SHACL Implementation in LINDAS

Current SHACL Files

SHACL validation shapes are manually created and maintained by Zazuko. All current SHACL files are documented in the Cube Schema repository under Validation shapes.

Retrieval Methods: The documentation explains different ways to access SHACL files (by version, latest release, etc.).

Usage Across Tools

SHACL files are integrated into multiple LINDAS components:

  • Cube Validator: Online validation service
  • Cube Creator: Built-in validation during cube creation
  • Zazuko Pipelines: Automated validation in data processing workflows

Stardog Integration Status

According to Zazuko communications, SHACL files are not currently used directly as Stardog Data Quality Constraints (as of October 2024).

Validation Profiles

Production-Ready Profiles

Profile Description Use Case
standalone-cube-constraint Standard cube profile with minimal metadata General cube validation
standalone-constraint-constraint Minimal dimensions metadata (included in standalone-cube-constraint) Dimension validation
profile-opendataswiss Extends standalone-cube-constraint for OpenData.swiss Handbook compliance OpenData.swiss publication
profile-opendataswiss-lindas Specialized metadata for cubes published from LINDAS to OpenData.swiss LINDAS→OpenData.swiss workflow
profile-visualize Cube metadata requirements for Visualize platform compatibility Visualize integration
basic-cube-constraint Minimum cube requirements (typically not used directly) Base validation

Profiles Under Development

Proposed Profile Description Status
datacatalog / dataset Generic dataset metadata (extends OpenData.swiss profile)
Shared dimensions Validation for reusable dimensions and hierarchies Partial implementation available from Cube Creator
Hierarchies templates SHACL shapes for dimensional hierarchies Partial implementation available from Cube Creator

Note: Zazuko has indicated that Cube Creator already contains shapes for both Shared Dimensions and Hierarchies that could be extracted and shared.

Validation Tools

Online Validation Services

Cube Validator

Primary LINDAS validation service: https://cube-validator.lindas.admin.ch

Features:

  • SHACL-based cube validation
  • Detailed error reporting with severity levels
  • Integration with LINDAS SPARQL endpoints

Technical Implementation: The validator uses Barnard59 CLI commands. Example command structure:

npx barnard59 cube fetch-metadata --endpoint "https://lindas.admin.ch/query" --cube "https://environment.ld.admin.ch/foen/ubd010701/8" | npx barnard59 cube check-metadata --profile "https://cube.link/ref/main/shape/standalone-cube-constraint" | npx barnard59 shacl report-summary

Open Questions:

  • Observation Tab: Source of observation validation results unclear
  • Severity Levels: Mapping between SHACL sh:severity values and displayed severities needs clarification

SHACL Playground (Zazuko)

Interactive SHACL testing: https://shacl-playground.zazuko.com/

Features:

  • Real-time SHACL rule testing
  • Referenced as "newer implementation" by shacl.org
  • Ideal for development and testing

Additional Validation Tools

Tool Description URL Use Case
CSV Validation (Zazuko) CSV file validation https://zazuko.com/csv-validate/ Pre-processing validation
CSV-Lint Generic CSV validation https://csvlint.io/ Data quality checks

Local Validation

Apache Jena SHACL

Command-line validation toolkit: https://jena.apache.org/documentation/shacl/

Installation and Usage:

# Install Apache Jena with SHACL support
# Then validate with:
shacl validate --shapes shape_file.ttl --data data_file.ttl

Python-based Validation

PySHACL: RDFLib SHACL implementation

Suitable for integration into Python-based data processing workflows.

Validation Workflow Integration

Cube Creator Integration

  • Automatic validation during cube creation process
  • Real-time feedback on data quality issues
  • Profile-specific validation based on intended use

Pipeline Integration

Validation can be integrated into automated data processing pipelines:

  1. Data Ingestion: Initial format validation
  2. Transformation: Intermediate validation checks
  3. Publication: Final SHACL compliance validation

Example Pipeline Template

Reference implementation: Python Pipeline Template

Note: Template documentation should be updated to reflect current Cube Validator implementation.

Alternative Approaches

Code-based Validation

While Python-based validation examples exist, SHACL-based validation is the recommended approach for LINDAS. Code-based validation should only supplement SHACL when specific requirements cannot be expressed in SHACL syntax.

SHACL vs. Code-based Trade-offs

Approach Advantages Disadvantages
SHACL Standardized, tool-agnostic, declarative Learning curve, limited expressions
Code-based Flexible, complex logic support Maintenance overhead, tool-specific

Historical Context

Legacy SHACL Files

Original SHACL files created for LINDAS are archived at: https://github.com/zazuko/cube-shacl-validation

Status: No updates since November 2021; superseded by current implementation.

Evolution Timeline

  1. Initial Implementation (2021): Basic SHACL validation
  2. Production Integration (2022-2023): Tool integration and refinement
  3. Profile Specialization (2024): Multiple validation profiles for different use cases
  4. LINDASnext Preparation (2025): Migration considerations and enhancements

Best Practices

For Data Publishers

  1. Validate Early: Use Cube Validator during development
  2. Choose Appropriate Profile: Select validation profile matching intended use
  3. Iterative Improvement: Address validation errors incrementally
  4. Documentation: Document custom validation requirements

For Platform Administrators

  1. Regular Profile Updates: Keep SHACL files synchronized with requirements
  2. Tool Integration: Ensure validation is integrated into all data pathways
  3. Error Monitoring: Track validation failures across the platform
  4. Community Feedback: Gather input for SHACL improvements

Support and Resources

Documentation

Community Support


For the latest SHACL files and validation profiles, always refer to the Cube Schema repository as the authoritative source.

Clone this wiki locally