Skip to content

Add read_rdf_prefixes() table function for RDF namespace introspection#35

Open
nonodename wants to merge 3 commits intomainfrom
claude/add-rdf-prefixes-function-lzyQY
Open

Add read_rdf_prefixes() table function for RDF namespace introspection#35
nonodename wants to merge 3 commits intomainfrom
claude/add-rdf-prefixes-function-lzyQY

Conversation

@nonodename
Copy link
Copy Markdown
Owner

Summary

This PR adds a new read_rdf_prefixes() table function that extracts @prefix and @base declarations from Turtle and TriG RDF files. This enables namespace introspection, documentation generation, and building CURIE-aware tooling without parsing full RDF triples.

Key Changes

  • New table function read_rdf_prefixes(): Reads one or more Turtle or TriG files and returns their prefix/base declarations as rows with columns: prefix (VARCHAR, NULL for @base), uri (VARCHAR), and is_base (BOOLEAN)

  • File format support:

    • Supports Turtle (.ttl) and TriG (.trig) formats
    • Rejects NTriples and NQuads with clear error messages (these formats have no prefix declarations)
    • Auto-detects format from file extension or accepts explicit file_type parameter
  • Parameters:

    • path: File path or glob pattern (required)
    • strict_parsing: Boolean flag (default true) to control error handling for malformed content
    • file_type: Override format detection (values: ttl, turtle, trig)
    • include_filenames: Boolean flag (default false) to add a 4th filename column showing source file
  • Implementation details:

    • Uses libserd for efficient prefix extraction without full RDF parsing
    • Pre-computes all rows during global initialization for thread-safe scanning
    • Handles @base declarations (which have no prefix name) by emitting SQL NULL in the prefix column
    • Supports glob patterns for batch processing multiple files
    • Proper error handling with file validation at bind time
  • Documentation: Added comprehensive function documentation to docs/functions.md and usage examples to README.md

  • Tests: Added 20+ test cases covering basic functionality, glob patterns, file type validation, error cases, and parameter combinations

Notable Implementation Details

  • Uses SERD callbacks (PrefixBaseCallback, PrefixNameCallback) to extract only prefix declarations without parsing full RDF statements
  • Pre-computed PrefixRow vector approach ensures thread-safe, single-threaded scanning with atomic position tracking
  • File type validation happens at bind time to provide early error feedback
  • Integrates seamlessly with existing RDF extension infrastructure (reuses ITriplesBuffer::FileType and detection logic)

https://claude.ai/code/session_01VZXDXseqTYk3vvxypTwQYw

claude and others added 3 commits April 8, 2026 13:43
Returns @Prefix and @base declarations from Turtle and TriG files as a
3-column table: prefix (VARCHAR, NULL for @base), uri (VARCHAR),
is_base (BOOLEAN). Supports the same strict_parsing, file_type, and
include_filenames parameters as read_rdf(), and glob patterns.
Throws InvalidInputException for NTriples, NQuads, RDF/XML, and
unknown file types at bind time for clean error propagation.

- src/read_rdf_prefixes.cpp — core implementation
- src/include/read_rdf_prefixes.hpp — registration header
- test/sql/read_rdf_prefixes.test — 30 test assertions
- CMakeLists.txt — add new source file
- src/rdf_extension.cpp — register function in LoadInternal()
- docs/functions.md — function reference documentation
- README.md — usage section with example output
- TODO.md — mark item #3 as complete

https://claude.ai/code/session_01VZXDXseqTYk3vvxypTwQYw
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants