|
2 | 2 |
|
3 | 3 | Natural language for human and machine.
|
4 | 4 |
|
5 |
| ---- |
| 5 | +**NLCST** discloses the parts of natural language as a concrete syntax |
| 6 | +tree. Concrete means all information is stored in this tree and an |
| 7 | +exact replica of the original document can be re-created. |
6 | 8 |
|
7 |
| -> Note: Several projects use this document. Do not make changes without consulting with [TextOM](https://github.com/wooorm/textom), [parse-latin](https://github.com/wooorm/parse-latin), and [retext](https://github.com/wooorm/retext). |
| 9 | +**NLCST** is a subset of [**Unist**][unist], and implemented by |
| 10 | +[**retext**][retext]. |
8 | 11 |
|
9 |
| -## CST |
10 |
| - |
11 |
| -### Node |
| 12 | +## Table of Contents |
12 | 13 |
|
13 |
| -Node represents any unit in NLCST hierarchy. |
| 14 | +- [CST](#cst) |
14 | 15 |
|
15 |
| -``` |
16 |
| -interface Node { |
17 |
| - type: string; |
18 |
| - data: Data | null; |
19 |
| -} |
20 |
| -``` |
| 16 | + - [Root](#root) |
| 17 | + - [Paragraph](#paragraph) |
| 18 | + - [Sentence](#sentence) |
| 19 | + - [Word](#word) |
| 20 | + - [Symbol](#symbol) |
| 21 | + - [Punctuation](#punctuation) |
| 22 | + - [WhiteSpace](#whitespace) |
| 23 | + - [Source](#source) |
| 24 | + - [TextNode](#textnode) |
21 | 25 |
|
22 |
| -### Data |
| 26 | +- [List of Utilities](#list-of-utilities) |
23 | 27 |
|
24 |
| -Data represents data associated with any node. Data is a scope for plug-ins to store any information. Its only limitation being that each property should by stringifyable: not throw when passed to `JSON.stringify()`. |
| 28 | +- [License](#license) |
25 | 29 |
|
26 |
| -``` |
27 |
| -interface Data { } |
28 |
| -``` |
| 30 | +## CST |
29 | 31 |
|
30 |
| -### Parent |
| 32 | +### `Root` |
31 | 33 |
|
32 |
| -Parent ([Node](#node)) represents a unit in NLCST hierarchy which can have zero or more children. |
| 34 | +`Root` ([`Parent`][parent]) houses all nodes. |
33 | 35 |
|
34 |
| -``` |
35 |
| -interface Parent <: Node { |
36 |
| - children: []; |
| 36 | +```idl |
| 37 | +interface Root <: Parent { |
| 38 | + type: "RootNode"; |
37 | 39 | }
|
38 | 40 | ```
|
39 | 41 |
|
40 |
| -### Text |
| 42 | +### `Paragraph` |
41 | 43 |
|
42 |
| -Text ([Node](#node)) represents a unit in NLCST hierarchy which has value. |
| 44 | +`Paragraph` ([`Parent`][parent]) represents a self-contained unit of |
| 45 | +discourse in writing dealing with a particular point or idea. |
43 | 46 |
|
44 |
| -``` |
45 |
| -interface Text <: Node { |
46 |
| - value: string; |
| 47 | +```idl |
| 48 | +interface Paragraph <: Parent { |
| 49 | + type: "ParagraphNode"; |
47 | 50 | }
|
48 | 51 | ```
|
49 | 52 |
|
50 |
| -### RootNode |
| 53 | +### `Sentence` |
51 | 54 |
|
52 |
| -Root ([Parent](#parent)) represents a document. |
| 55 | +`Sentence` ([`Parent`][parent]) represents grouping of grammatically |
| 56 | +linked words, that in principle tells a complete thought, although it |
| 57 | +may make little sense taken in isolation out of context. |
53 | 58 |
|
54 |
| -``` |
55 |
| -interface RootNode < Parent { |
56 |
| - type: "RootNode"; |
| 59 | +```idl |
| 60 | +interface Sentence <: Parent { |
| 61 | + type: "SentenceNode"; |
57 | 62 | }
|
58 | 63 | ```
|
59 | 64 |
|
60 |
| -### ParagraphNode |
| 65 | +### `Word` |
61 | 66 |
|
62 |
| -Paragraph ([Parent](#parent)) represents a self-contained unit of discourse in writing dealing with a particular point or idea. |
| 67 | +`Word` ([`Parent`][parent]) represents the smallest element that may |
| 68 | +be uttered in isolation with semantic or pragmatic content. |
63 | 69 |
|
64 |
| -``` |
65 |
| -interface ParagraphNode < Parent { |
66 |
| - type: "ParagraphNode"; |
| 70 | +```idl |
| 71 | +interface Word <: Parent { |
| 72 | + type: "WordNode"; |
67 | 73 | }
|
68 | 74 | ```
|
69 | 75 |
|
70 |
| -### SentenceNode |
| 76 | +### `Symbol` |
71 | 77 |
|
72 |
| -Sentence ([Parent](#parent)) represents grouping of grammatically linked words, that in principle tells a complete thought, although it may make little sense taken in isolation out of context. |
| 78 | +`Symbol` ([`Text`][text]) represents typographical devices like |
| 79 | +white space, punctuation, signs, and more, different from characters |
| 80 | +which represent sounds (like letters and numerals). |
73 | 81 |
|
74 |
| -``` |
75 |
| -interface SentenceNode < Parent { |
76 |
| - type: "SentenceNode"; |
| 82 | +```idl |
| 83 | +interface Symbol <: Text { |
| 84 | + type: "SymbolNode"; |
77 | 85 | }
|
78 | 86 | ```
|
79 | 87 |
|
80 |
| -### WordNode |
| 88 | +### `Punctuation` |
81 | 89 |
|
82 |
| -Word ([Parent](#parent)) represents the smallest element that may be uttered in isolation with semantic or pragmatic content. |
| 90 | +`Punctuation` ([`Symbol`][symbol]) represents typographical devices |
| 91 | +which aid understanding and correct reading of other grammatical |
| 92 | +units. |
83 | 93 |
|
84 |
| -``` |
85 |
| -interface WordNode < Parent { |
86 |
| - type: "WordNode"; |
| 94 | +```idl |
| 95 | +interface Punctuation <: Symbol { |
| 96 | + type: "PunctuationNode"; |
87 | 97 | }
|
88 | 98 | ```
|
89 | 99 |
|
90 |
| -### SymbolNode |
| 100 | +### `WhiteSpace` |
91 | 101 |
|
92 |
| -Symbol ([Text](#text)) represents typographical devices like white space, punctuation, signs, and more, different from characers which represent sounds (like letters and numerals). |
| 102 | +`WhiteSpace` ([`Symbol`][symbol]) represents typographical devices |
| 103 | +devoid of content, separating other grammatical units. |
93 | 104 |
|
94 |
| -``` |
95 |
| -interface SymbolNode < Text { |
96 |
| - type: "SymbolNode"; |
| 105 | +```idl |
| 106 | +interface WhiteSpace <: Symbol { |
| 107 | + type: "WhiteSpaceNode"; |
97 | 108 | }
|
98 | 109 | ```
|
99 | 110 |
|
100 |
| -### PunctuationNode |
| 111 | +### `Source` |
101 | 112 |
|
102 |
| -Punctuation ([SymbolNode](#symbolnode)) represents typographical devices which aid understanding and correct reading of other grammatical units. |
| 113 | +`Source` ([`Text`][text]) represents an external (ungrammatical) value |
| 114 | +embedded into a grammatical unit: a hyperlink, a line, and such. |
103 | 115 |
|
104 |
| -``` |
105 |
| -interface PunctuationNode < SymbolNode { |
106 |
| - type: "PunctuationNode"; |
| 116 | +```idl |
| 117 | +interface Source <: Symbol { |
| 118 | + type: "SourceNode"; |
107 | 119 | }
|
108 | 120 | ```
|
109 | 121 |
|
110 |
| -### WhiteSpaceNode |
| 122 | +### `TextNode` |
111 | 123 |
|
112 |
| -White Space ([SymbolNode](#symbolnode)) represents typographical devices devoid of content, separating other grammatical units. |
| 124 | +`TextNode` ([`Text`][text]) represents actual content in an NLCST |
| 125 | +document: one or more characters. Note that its `type` property |
| 126 | +is `TextNode`, but it is different from the asbtract [`Text`][text] |
| 127 | +interface. |
113 | 128 |
|
114 |
| -``` |
115 |
| -interface WhiteSpaceNode < SymbolNode { |
116 |
| - type: "WhiteSpaceNode"; |
| 129 | +```idl |
| 130 | +interface TextNode < Text { |
| 131 | + type: "TextNode"; |
117 | 132 | }
|
118 | 133 | ```
|
119 | 134 |
|
120 |
| -### SourceNode |
| 135 | +## List of Utilities |
121 | 136 |
|
122 |
| -Source ([Text](#text)) represents an external (ungrammatical) value embedded into a grammatical unit: a hyperlink, a line, and such. |
| 137 | +<!--lint disable list-item-spacing--> |
123 | 138 |
|
124 |
| -``` |
125 |
| -interface SourceNode < Text { |
126 |
| - type: "SourceNode"; |
127 |
| -} |
128 |
| -``` |
| 139 | +- [`wooorm/nlcst-is-literal`](https://github.com/wooorm/nlcst-is-literal) |
| 140 | + — Check whether a node is meant literally; |
| 141 | +- [`wooorm/nlcst-normalize`](https://github.com/wooorm/nlcst-normalize) |
| 142 | + — Normalize a word for easier comparison; |
| 143 | +- [`wooorm/nlcst-search`](https://github.com/wooorm/nlcst-search) |
| 144 | + — Search for patterns in an NLCST tree; |
| 145 | +- [`wooorm/nlcst-to-string`](https://github.com/wooorm/nlcst-to-string) |
| 146 | + — Stringify a node; |
| 147 | +- [`wooorm/nlcst-test`](https://github.com/wooorm/nlcst-test) |
| 148 | + — Validate a NLCST node; |
129 | 149 |
|
130 |
| -### TextNode |
| 150 | +In addition, see [**Unist**][unist] for other utilities which |
| 151 | +work with **retext** nodes. |
131 | 152 |
|
132 |
| -Text ([Text](#text)) represents actual content in an NLCST document: one or more characters. |
| 153 | +## License |
133 | 154 |
|
134 |
| -``` |
135 |
| -interface TextNode < Text { |
136 |
| - type: "TextNode"; |
137 |
| -} |
138 |
| -``` |
| 155 | +MIT © Titus Wormer |
139 | 156 |
|
140 |
| -## Related |
| 157 | +<!--Definitions--> |
141 | 158 |
|
142 |
| -- [retext](https://github.com/wooorm/retext) — Analyse and Manipulate natural language, 20+ plug-ins. |
143 |
| -- [parse-latin](https://github.com/wooorm/parse-latin) — Transforms latin-script natural language into a CST; |
144 |
| -- [TextOM](https://github.com/wooorm/textom) — Provides an object-oriented manipulation interface to NLCST; |
145 |
| -- [nlcst-to-string](https://github.com/wooorm/nlcst-to-string) — Transforms a CST into a string; |
146 |
| -- [nlcst-to-textom](https://github.com/wooorm/nlcst-to-textom) — Transforms a CST into a [TextOM](https://github.com/wooorm/textom) object model; |
147 |
| -- [nlcst-test](https://github.com/wooorm/nlcst-test) — Validate an NLCST node. |
| 159 | +[unist]: https://github.com/wooorm/unist |
148 | 160 |
|
149 |
| -## License |
| 161 | +[retext]: https://github.com/wooorm/retext |
150 | 162 |
|
151 |
| -MIT © Titus Wormer |
| 163 | +[parent]: https://github.com/wooorm/unist#parent |
| 164 | + |
| 165 | +[text]: https://github.com/wooorm/unist#text |
| 166 | + |
| 167 | +[symbol]: #symbol |
0 commit comments