The graph is stored in a data type which has the following type:
interface Graph {
source: Token[]
target: Token[]
edges: Record<string, Edge>
comment?: string
}
interface Token {
text: string
/** an identifier that is unique over the whole graph */
id: string
}
interface Edge {
/** a convenience copy of the identifier used in the edges object of the graph */
id: string
/** these are ids to source and target tokens */
ids: string[]
/** labels on this edge */
labels: string[]
/** is this manually or automatically aligned */
manual: boolean
comment?: string
}
The graph is subject to an invariant (checked with the function check_invariant
):
- all defined identifiers are unique over the graph
- all referenced identifiers exist
- each tokens is referenced in one edge
- text tokens match the regex
/\s*\S+\s+/
- the graph is aligned
- each of the graph comment and edge comments are non-empty strings if present
A graph with source and target texts each being w1 w2
:
{
"source": [{"id": "s0", "text": "w1 "}, {"id": "s1", "text": "w2 "}],
"target": [{"id": "t0", "text": "w1 "}, {"id": "t1", "text": "w2 "}],
"edges": {
"e-s0-t0": {
"id": "e-s0-t0",
"ids": ["s0", "t0"],
"labels": [],
"manual": false
},
"e-s1-t1": {
"id": "e-s1-t1",
"ids": ["s1", "t1"],
"labels": [],
"manual": false
}
}
}
The source word apa
automatically aligned with bepa
, with the label "A"
:
{
"source": [{"id": "s0", "text": "apa "}],
"target": [{"id": "t0", "text": "bepa "}],
"edges": {
"e-s0-t0": {
"id": "e-s0-t0",
"ids": ["s0", "t0"],
"labels": ["A"],
"manual": false
}
}
}
There is a derived form of looking at the data which is used to draw the graph in the interface. The data types look like this:
interface Dropped {
edit: 'Dropped'
target: Token
id: string
manual: boolean
}
interface Dragged {
edit: 'Dragged'
source: Token
id: string
manual: boolean
}
interface Edited {
edit: 'Edited'
source: Token[]
target: Token[]
id: string
manual: boolean
}
The names are inspired by that Edited have a fixed position and the displaced tokens have been Dragged from somewhere in the source text and Dropped somewhere in the target text.
Additionally there is an enriched version that gives intra-token character-diffs:
type RichDiff =
| Edited & {index: number} & {target_diffs: TokenDiff[]; source_diffs: TokenDiff[]}
| Dragged & {index: number} & {source_diff: TokenDiff}
| Dropped & {index: number} & {target_diff: TokenDiff}
type TokenDiff = [-1 | 0 | 1 /* deleted, unmodified, inserted */, string][]