Work with RDF-related concepts, datasets, and files in Go.
- Decode TriG, N-Quads, XML, JSON-LD, HTML, and other RDF-based sources.
- Use blank nodes, literals, triples, quads, iterators and other RDF primitives.
- Reference IRI constants from
rdf,owl,xsd, and custom vocabularies. - Expand, compact, and parse IRI prefixes and CURIEs.
- Canonicalize datasets with RDFC-1.0.
Refer to the code's documentation (pkg.go.dev). Below are a few packages to get started with...
// rdf contains primitives for statements and value types
import "github.com/dpb587/rdfkit-go/rdf"
// encoding subpackages are well-known formats with encoders/decoders
import "github.com/dpb587/rdfkit-go/encoding/nquads"
// rdfio offers simplified access to files and encodings
import "github.com/dpb587/rdfkit-go/rdfio"
// rdfcanon implements the dataset canonicalization algorithm
import "github.com/dpb587/rdfkit-go/rdfcanon"The examples submodules demonstrates some common use cases and starter snippets.
html-extract$ go run . https://microsoft.com
@base <https://www.microsoft.com/en-us> .
@prefix og: <http://ogp.me/ns#> .
@prefix schema: <http://schema.org/> .
<>
a schema:WebSite ;
schema:potentialAction [
a schema:SearchAction ;
schema:query-input "required name=search_term_string" ;
schema:target [
a schema:EntryPoint ;
schema:urlTemplate "https://www.microsoft.com/en-us/search/explore?q={search_term_string}&ocid=AID_seo_sitelinks_search"
]
] ;
schema:url <> .
<>
og:description "Explore Microsoft products and services and support for your home or business. Shop Microsoft 365, Copilot, Teams, Xbox, Windows, Azure, Surface and more."@en-US ;
og:title "Microsoft – AI, Cloud, Productivity, Computing, Gaming & Apps"@en-US ;
og:type "website"@en-US ;
og:url "https://www.microsoft.com/en-us"@en-US .
[]
a schema:Organization ;
schema:logo <https://uhf.microsoft.com/images/microsoft/RE1Mu3b.png> ;
schema:name "Microsoft" ;
schema:url <https://www.microsoft.com> .The cmd/rdfkit submodule offers command line access to some common tasks. Refer to its subpackages to learn more about their implementations.
rdfkit$ go run . --help
Usage:
rdfkit [command]
Available Commands:
canonicalize Convert a dataset into canonical blank nodes and ordering
completion Generate the autocompletion script for the specified shell
export-dot Generate a Graphviz DOT visualization from an ontology
export-go-iri Generate a Go file of IRI constants from an ontology
help Help about any command
pipe Decode and re-encode using supported encoding formats
version Print version information
Flags:
-h, --help help for rdfkit
Use "rdfkit [command] --help" for more information about a command.
rdfkit$ go run . pipe --help
Decode and re-encode using supported encoding formats
Usage:
rdfkit pipe [flags]
Flags:
-h, --help help for pipe
-i, --in string path or IRI for reading (default stdin)
--in-base string override the base IRI of the resource
--in-param stringArray extra decode configuration parameters (syntax "KEY[=VALUE]")
--in-param-io stringArray extra read configuration parameters (syntax "KEY[=VALUE]")
--in-type string name or alias for the decoder (default detect)
-o, --out string path or IRI for writing (default stdout)
--out-base string override the base IRI of the resource
--out-param stringArray extra encode configuration parameters (syntax "KEY[=VALUE]")
--out-param-io stringArray extra write configuration parameters (syntax "KEY[=VALUE]")
--out-type string name or alias for the encoder (default detect or nquads)
Encodings:
org.json-ld.document (decode)
Aliases: jsonld
File Extensions: .jsonld
Media Types: application/ld+json
--in-param captureTextOffsets[=bool]
Capture the line+column offsets for statement properties
--in-param tokenizer.lax[=bool]
Accept and recover common syntax errors
org.w3.n-quads (decode, encode)
Aliases: n-quads, nq, nquads
File Extensions: .nq
Media Types: application/n-quads
--in-param captureTextOffsets[=bool]
Capture the line+column offsets for statement properties
--out-param ascii[=bool]
Use escape sequences for non-ASCII characters
org.w3.n-triples (decode, encode)
Aliases: n-triples, nt, ntriples
File Extensions: .nt
Media Types: application/n-triples
--in-param captureTextOffsets[=bool]
Capture the line+column offsets for statement properties
--out-param ascii[=bool]
Use escape sequences for non-ASCII characters
org.w3.rdf-json (decode, encode)
Aliases: rdf-json, rdfjson, rj
File Extensions: .rj
Media Types: application/rdf+json
--in-param captureTextOffsets[=bool]
Capture the line+column offsets for statement properties
org.w3.rdf-xml (decode)
Aliases: rdf-xml, rdfxml, xml
File Extensions: .rdf
Media Types: application/rdf+xml
--in-param captureTextOffsets[=bool]
Capture the line+column offsets for statement properties
org.w3.trig (decode)
Aliases: trig
File Extensions: .trig
Media Types: application/trig
--in-param captureTextOffsets[=bool]
Capture the line+column offsets for statement properties
org.w3.turtle (decode, encode)
Aliases: ttl, turtle
File Extensions: .ttl
Media Types: text/turtle
--in-param captureTextOffsets[=bool]
Capture the line+column offsets for statement properties
--out-param buffered[=bool]
Load all statements into memory before writing any output
--out-param iris.useBase[=bool]
Prefer IRIs relative to the resource IRI
--out-param iris.usePrefix=string...
Prefer IRIs using a prefix. Use the syntax of "{prefix}:{iri}", "rdfa-context", or "none"
--out-param resources[=bool]
Write nested statements and resource descriptions (implies buffered=true)
public.html (decode)
Aliases: htm, html, xhtml
File Extensions: .htm, .html, .xhtml
Media Types: application/xhtml+xml, text/html, text/xhtml+xml
--in-param captureTextOffsets[=bool]
Capture the line+column offsets for statement properties
Based on the Resource Description Framework (RDF), there are three primitive value types, aka terms, that are used to represent data: IRIs, literals, and blank nodes. The primitive value types are the basis of triples and other assertions about information.
An IRI records a URL-based identity as a string value.
resourceIRI := rdf.IRI("http://example.com/resource")Some well-known IRIs are defined in subpackages such as rdfiri and xsdiri - see Ontologies for more details. The iri package provides additional support for mapping IRIs from prefixes and CURIEs.
A literal records more traditional data values, such as booleans and strings. It must include both a datatype (IRI) and its string-encoded data. The lexical form should always follow the datatype-specific recommendations for valid data.
trueLiteral := rdf.Literal{
Datatype: xsdiri.Boolean_Datatype,
LexicalForm: "true",
}A Tag field is supported for limited datatypes, namely rdf:langString, where additional properties are required for the literal.
helloWorldLiteral := rdf.Literal{
Datatype: rdfiri.LangString_Datatype,
LexicalForm: "Hello World",
Tag: rdf.LanguageLiteralTag{
Language: "en",
},
}Literals can be tedious to work with, so some well-known data types have Go primitives and utility functions - see Ontologies for more details.
helloWorldLiteral == rdfobject.NewLangString("en", "Hello World")A blank node represents an anonymous resource and are always created with a unique, internal identifier. Two blank nodes are equivalent if and only if they have the same identifier.
bnode := rdf.NewBlankNode()
bnode.Identifier != rdf.NewBlankNode().IdentifierThe blanknodes package provides additional support for using string-based identifiers (e.g. b0) and other utilities.
A triple is used to describe some sort of statement about the world. Within the triple, a subject is said to have some relationship, the predicate, with an object.
nameTriple := rdf.Triple{
Subject: rdf.NewBlankNode("b0"),
Predicate: schemairi.Name_Property,
Object: helloWorldLiteral,
}The rdf package includes other supporting types (e.g. TripleList, TripleIterator, and TripleMatcher), and the triples package offers additional interfaces and utilities for working with triples.
A quad is used to describe a triple with an optional graph name.
nameQuad := rdf.Quad{
Triple: nameTriple,
GraphName: rdf.IRI("http://example.com/graph"),
}The rdf and quads packages offer additional interfaces and utilities for working with quads.
The fields of triples and quads are restricted (with interfaces) to the normative value types they support, described by the table below.
| Field | IRI | Literal | Blank Node | nil |
|---|---|---|---|---|
| Subject | Valid | Invalid | Valid | Invalid |
| Predicate | Valid | Invalid | Invalid | Invalid |
| Object | Valid | Valid | Valid | Invalid |
| GraphName | Valid | Invalid | Valid | Valid |
A graph is a set of triples, all of which collectively describe the state of a world. The triples.Graph* interfaces describe basic operations, such as adding or iterating triples.
err := graph.AddTriple(ctx, nameTriple)
iter, err := graph.NewTripleIterator(ctx)A dataset is a set of graphs (is a set of triples). The quads.Dataset* interfaces describe basic operations, such as adding or iterating quads.
err := dataset.AddQuad(ctx, nameQuad)
iter, err := dataset.NewQuadIterator(ctx)The usage of a dataset vs graph vs dataset graphs is very application-specific. For broader discussion on the semantics and logical considerations of datasets, review this W3C Note.
The inmemory experimental package offers a dataset implementation which may be useful for small collections and labeled property graph features.
storage := inmemory.NewDataset()Better-supported storage or alternative, remote service clients will likely be a focus in the future.
An encoding (or file format) is used to decode and encode RDF data. The following encodings are available under the encoding package.
| Package | Decode | Encode |
|---|---|---|
htmljsonld |
Quad | - |
htmlmicrodata |
Triple | - |
htmlrdfa |
Triple | - |
jsonld |
Quad | Quad, Description |
nquads |
Quad | Quad |
ntriples |
Triple | Triple |
rdfjson |
Triple | Triple |
rdfxml |
Triple | - |
trig |
Quad | - |
turtle |
Triple | Triple, Description |
Encodings provide a NewDecoder function which require an io.Reader and optional DecoderConfig options. It can be used as an iterator for all statements found in the encoding. Depending on the capabilities of the encoding format, the decoder fulfills either the encoding.TripleDecoder or encoding.QuadDecoder interface.
decoder := nquads.NewDecoder(os.Stdin)
defer decoder.Close()
for decoder.Next() {
quad := statement.Quad()
fmt.Fprintf(os.Stdout, "%v\t%v\t%v\t%v\n", quad.Triple.Subject, quad.Triple.Predicate, quad.Triple.Object, quad.GraphName)
}
err := decoder.Err()Most are stream processors, so valid statements may be produced before a syntax error is encountered. When a syntax error occurs, the byte offset (and text offset, when enabled) of the occurrence is included.
Most decoders can capture the exact byte and line+column offsets where a statement's subject, predicate, object, and graph name was decoded from the source. The encoding.StatementTextOffsetsProvider interface supports accessing a StatementTextOffsets map of property-offsets. To enable capturing this metadata, use the decoder's CaptureTextOffsets option.
for decoder.Next() {
for propertyType, propertyOffsets := range decoder.StatementTextOffsets() {
fmt.Fprintf(
os.Stderr,
"> found %s from L%dC%d (byte %d) until %s (byte %d)\n",
encoding.StatementOffsetsTypeName(propertyType),
propertyOffsets.From.LineColumn[0],
propertyOffsets.From.LineColumn[1],
propertyOffsets.From.Byte,
// same as L%dC%d
propertyOffsets.Until.LineColumn.TextOffsetRangeString(),
propertyOffsets.Until.Byte,
)
}
}When working with offsets, consider the following caveats.
- Capturing and processing text offsets impacts the performance and memory.
- Offsets for some properties may not always be available due to decoding limitations.
- Offsets for some properties may be "incomplete" due to stream processing. For example,
turtlemay only refer to the opening[token of an anonymous resource when the closing]token has not yet been read.
A few encodings similarly provide a NewEncoder requiring an io.Writer and EncoderConfig options. At a minimum, encoders fulfill the encoding.TripleEncoder or encoding.QuadEncoder interfaces.
encoder := nquads.NewWriter(os.Stdout)
defer encoder.Close()
for _, quad := range quadList {
err := encoder.AddQuad(ctx, quad)
}When encoding data, the Close method must be called before the data can be successfully decoded.
The rdfdescription package offers an alternative method for describing nested resources and statements.
resource := rdfdescription.SubjectResource{
Subject: rdf.IRI("http://example.com/product"),
Statements: rdfdescription.StatementList{
rdfdescription.ObjectStatement{
Predicate: rdfiri.Type_Property,
Object: schemairi.Product_Thing,
},
rdfdescription.AnonResourceStatement{
Predicate: schemairi.Offer_Property,
AnonResource: rdfdescription.AnonResource{
Statements: rdfdescription.StatementList{
rdfdescription.ObjectStatement{
Predicate: rdfiri.Type_Property,
Object: schemairi.Offer_Thing,
},
rdfdescription.ObjectStatement{
Predicate: schemairi.Price_Property,
Object: schemaobject.Number(55),
},
rdfdescription.ObjectStatement{
Predicate: schemairi.PriceCurrency_Property,
Object: schemaobject.Text("USD"),
},
},
},
},
},
}A description can be converted to triples by calling its NewTriples function. Each invocation creates new blank nodes for anonymous resources, so the triples returned from subsequent invocations may be non-isomorphic.
resourceTriples := resource.NewTriples()Some encodings support a syntax for structured statements (e.g. JSON-LD, Turtle) and implement the rdfdescriptionutil.Encoder or rdfdescriptionutil.DatasetEncoder interface.
err := turtleEncoder.AddResource(ctx, resource)The ResourceListBuilder may be used to construct resources from their triples. Once constructed, they can be enumerated with ExportResources() or sent directly to supported encoders.
builder := rdfdescription.NewResourceListBuilder()
// builder.Add(rdf.Triple{...}, ...)
err := builder.ToResourceWriter(ctx, turtleEncoder, rdfdescription.DefaultExportResourceOptions)The rdfcanon package implements the RDFC-1.0 algorithm based on RDF Dataset Canonicalization.
canonicalized, err := rdfcanon.Canonicalize(quadIterator)Once canonicalized, the encoded N-Quads form can be directly written to an io.Writer.
_, err := canonicalized.WriteTo(os.Stdout)Alternatively, use NewIterator to manually iterate over the results containing its encoded form. If the BuildCanonicalQuad option was enabled, use NewQuadIterator for a standard rdf.QuadIterator of quads with the canonical blank nodes.
canonicalized, err := rdfcanon.Canonicalize(quadIterator, rdfcanon.CanonicalizeConfig{}.
SetBuildCanonicalQuad(true),
)
iter := canonicalized.NewQuadIterator()An ontology (or vocabulary) offers domain-specific conventions for working with data. Several well-known ontologies are within the ontology package and offer IRI constants, helpers for literals, and other data utilities.
- earl -
earliri,earltesting - foaf -
foafiri - owl -
owliri - rdf -
rdfiri,rdfliteral, andrdfvalue - rdfa -
rdfairi - rdfs -
rdfsiri - shacl -
shacliri - xsd -
xsdiri,xsdobject,xsdtype, and other utilities
To help maintain consistency, the following practices are used for the naming and implementations.
- The
{prefix}should be based on RDFa Core Initial Context, vann:preferredNamespacePrefix, or similarly-defined term. {prefix}iripackage - constants for resource IRIs defined in the vocabulary. Theirigencommand can be used for most of these.const Base rdf.IRI- the preferred base IRI. For example,http://www.w3.org/1999/02/22-rdf-syntax-ns#.const {Name}_{Type} rdf.IRI- For example, the statementrdf:type a rdf:Propertybecomes the constantrdfiri.Type_Propertywith a value ofBase + "type". If a resource is defined with multiple types, the first type listed in the vocabulary should be used.
{prefix}objectpackage - convenience functions wrapping{prefix}typeforrdf.ObjectValue-related types.func {Datatype}(...) rdf.ObjectValue- factory-style functions for returning a canonicalrdf.ObjectValuevalue.func Map{Datatype}(lexicalForm string) (rdf.ObjectValue, error)- for mapping a lexical form into a canonicalrdf.ObjectValuevalue.
{prefix}typepackage - Go-native types and utilities for working with defined datatypes.type {Datatype} {any}- a Go-native type representing canonical value forms, satisfying theobjecttypes.Valueinterface.func Map{Datatype}(lexicalForm string) ({Datatype}, error)- for mapping a lexical form into its Go-native type.
Mapping functions can decode the lexical form to return a Go-native type which represents the datatype (or error due to invalid input).
trueValue, err := xsdvalue.MapBoolean(trueLiteral.LexicalForm)
trueValue == xsdvalue.Boolean(true)
bool(trueValue) == true
trueValue.AsLiteralTerm() == trueLiteralThe top-level iri package provides utilities for transforming IRIs. The string type is used for a plain IRI, and it must be cast as an rdf.IRI when used within RDF statements.
A common practice with IRIs is defining prefixes that may be used to expand and compact IRIs. These prefixes are often used in encoding formats.
prefixes := iri.NewPrefixManager(iri.PrefixMappingList{
iri.PrefixMapping{
Prefix: "ex",
Expanded: "http://example.com/",
},
})
rIRI, ok := prefixes.ExpandPrefix(iri.PrefixReference{
Prefix: "ex",
Resource: "resource",
})
ok && rIRI == rdf.IRI("http://example.com/resource")
_, ok = prefixes.ExpandPrefix(iri.PrefixReference{
Prefix: "unknown",
Resource: "resource",
})
!ok
pr, ok := prefixes.CompactPrefix(rdf.IRI("http://example.com/resource"))
ok && pr.Prefix == "http://example.com/" && pr.Reference == "resource"
_, _, ok = prefixes.CompactPrefix(rdf.IRI("https://example.com/secure"))
!okThe curie package provides several functions for working with CURIE prefixes based on CURIE Syntax.
rCURIE, ok := curie.Parse("[ex:resource]")
ok && rCURIE.Safe && rCURIE.Prefix == "ex" && rCURIE.Reference == "resource"
mappings := curie.MappingScope{
Prefixes: prefixes,
}
rIRI, ok := mappings.ExpandCURIE(parsed)
ok && rIRI == "http://example.com/resource"The rdfacontext package provides a list of prefix mappings defined by the W3C at RDFa Core Initial Context. This includes prefixes such as owl:, rdfa:, and xsd:. The list of widely-used prefixes is included as well, which includes prefixes such as dc: and schema:.
commonPrefixes := rdfacontext.NewWidelyUsedInitialContext()- RDF 1.2 - not currently supported; likely to add primitive type support soon, encodings later.
- Generalized RDF usage is not currently supported.
- EARL Reports for well-known test suites are published as build artifacts (preview).
- This is periodically updated from a private fork and internal usage. There will be some breaking changes before starting to version this module.