-
Notifications
You must be signed in to change notification settings - Fork 3
Data Validation
📢 Important Update: LINDAS is undergoing a major infrastructure migration to LINDASnext. The current Stardog-based system is being replaced with GraphDB EE. New contracts have been awarded in two lots for the period 2025-2034: Lot 1 (Infrastructure) to Cognizone and Lot 2 (Application Development) to Liip, Zazuko, and Adnovum with their respective partners. Services remain operational during transition. See LINDASnext for details.
Data validation in LINDAS is based on the Shapes Constraint Language (SHACL), ensuring RDF data quality and compliance with platform standards.
SHACL validation shapes are manually created and maintained by Zazuko. All current SHACL files are documented in the Cube Schema repository under Validation shapes.
Retrieval Methods: The documentation explains different ways to access SHACL files (by version, latest release, etc.).
SHACL files are integrated into multiple LINDAS components:
- Cube Validator: Online validation service
- Cube Creator: Built-in validation during cube creation
- Zazuko Pipelines: Automated validation in data processing workflows
According to Zazuko communications, SHACL files are not currently used directly as Stardog Data Quality Constraints (as of October 2024).
| Profile | Description | Use Case |
|---|---|---|
| standalone-cube-constraint | Standard cube profile with minimal metadata | General cube validation |
| standalone-constraint-constraint | Minimal dimensions metadata (included in standalone-cube-constraint) | Dimension validation |
| profile-opendataswiss | Extends standalone-cube-constraint for OpenData.swiss Handbook compliance | OpenData.swiss publication |
| profile-opendataswiss-lindas | Specialized metadata for cubes published from LINDAS to OpenData.swiss | LINDAS→OpenData.swiss workflow |
| profile-visualize | Cube metadata requirements for Visualize platform compatibility | Visualize integration |
| basic-cube-constraint | Minimum cube requirements (typically not used directly) | Base validation |
| Proposed Profile | Description | Status |
|---|---|---|
| datacatalog / dataset | Generic dataset metadata (extends OpenData.swiss profile) | |
| Shared dimensions | Validation for reusable dimensions and hierarchies | Partial implementation available from Cube Creator |
| Hierarchies templates | SHACL shapes for dimensional hierarchies | Partial implementation available from Cube Creator |
Note: Zazuko has indicated that Cube Creator already contains shapes for both Shared Dimensions and Hierarchies that could be extracted and shared.
Primary LINDAS validation service: https://cube-validator.lindas.admin.ch
Features:
- SHACL-based cube validation
- Detailed error reporting with severity levels
- Integration with LINDAS SPARQL endpoints
Technical Implementation: The validator uses Barnard59 CLI commands. Example command structure:
npx barnard59 cube fetch-metadata --endpoint "https://lindas.admin.ch/query" --cube "https://environment.ld.admin.ch/foen/ubd010701/8" | npx barnard59 cube check-metadata --profile "https://cube.link/ref/main/shape/standalone-cube-constraint" | npx barnard59 shacl report-summaryOpen Questions:
- Observation Tab: Source of observation validation results unclear
-
Severity Levels: Mapping between SHACL
sh:severityvalues and displayed severities needs clarification
Interactive SHACL testing: https://shacl-playground.zazuko.com/
Features:
- Real-time SHACL rule testing
- Referenced as "newer implementation" by shacl.org
- Ideal for development and testing
| Tool | Description | URL | Use Case |
|---|---|---|---|
| CSV Validation (Zazuko) | CSV file validation | https://zazuko.com/csv-validate/ | Pre-processing validation |
| CSV-Lint | Generic CSV validation | https://csvlint.io/ | Data quality checks |
Command-line validation toolkit: https://jena.apache.org/documentation/shacl/
Installation and Usage:
# Install Apache Jena with SHACL support
# Then validate with:
shacl validate --shapes shape_file.ttl --data data_file.ttlPySHACL: RDFLib SHACL implementation
Suitable for integration into Python-based data processing workflows.
- Automatic validation during cube creation process
- Real-time feedback on data quality issues
- Profile-specific validation based on intended use
Validation can be integrated into automated data processing pipelines:
- Data Ingestion: Initial format validation
- Transformation: Intermediate validation checks
- Publication: Final SHACL compliance validation
Reference implementation: Python Pipeline Template
Note: Template documentation should be updated to reflect current Cube Validator implementation.
While Python-based validation examples exist, SHACL-based validation is the recommended approach for LINDAS. Code-based validation should only supplement SHACL when specific requirements cannot be expressed in SHACL syntax.
| Approach | Advantages | Disadvantages |
|---|---|---|
| SHACL | Standardized, tool-agnostic, declarative | Learning curve, limited expressions |
| Code-based | Flexible, complex logic support | Maintenance overhead, tool-specific |
Original SHACL files created for LINDAS are archived at: https://github.com/zazuko/cube-shacl-validation
Status: No updates since November 2021; superseded by current implementation.
- Initial Implementation (2021): Basic SHACL validation
- Production Integration (2022-2023): Tool integration and refinement
- Profile Specialization (2024): Multiple validation profiles for different use cases
- LINDASnext Preparation (2025): Migration considerations and enhancements
- Validate Early: Use Cube Validator during development
- Choose Appropriate Profile: Select validation profile matching intended use
- Iterative Improvement: Address validation errors incrementally
- Documentation: Document custom validation requirements
- Regular Profile Updates: Keep SHACL files synchronized with requirements
- Tool Integration: Ensure validation is integrated into all data pathways
- Error Monitoring: Track validation failures across the platform
- Community Feedback: Gather input for SHACL improvements
- SHACL Specification: https://www.w3.org/TR/shacl/
- Cube Schema Best Practices: https://github.com/zazuko/rdf-cube-schema/blob/master/best-practice.md
- Zazuko Zulip: https://zulip.zazuko.com/ (development discussions)
- LINDAS Support: support.lindas@bar.admin.ch (platform issues)
For the latest SHACL files and validation profiles, always refer to the Cube Schema repository as the authoritative source.
LINDAS Documentation | Home | Architecture | LINDASnext and Migration
Core Documentation
Operations & Monitoring
Dependencies & Architecture
- Ecosystem Dependencies Overview
- Inter-Component Dependencies
- External NPM Dependencies
- Legacy Dependencies Analysis
Current Migration: Stardog → GraphDB EE