@@ -33,195 +33,137 @@ fn main() {
3333 {
3434 name: 'Hello',
3535 count: 42,
36- maybe: NaN
36+ maybe: null
3737 }
3838" # ;
3939
4040 let parsed = from_str :: <MyData >(source ). unwrap ();
41- let expected = MyData {name : " Hello" . to_string (), count : 42 , maybe : Some ( NaN )}
42- assert_eq! (parsed , expected )
41+ let expected = MyData {name : " Hello" . to_string (), count : 42 , maybe : None };
42+ assert_eq! (parsed , expected );
4343}
4444```
45- ## Examples
46-
47- See the ` examples/ ` directory for examples of programs that utilize round-tripping features.
48-
49- - ` examples/json5-doublequote-fixer ` gives an example of tokenization-based round-tripping edits
50- - ` examples/json5-trailing-comma-formatter ` gives an example of model-based round-tripping edits
51-
52- ## Benchmarking
53-
54- Benchmarks are available in the ` benches/ ` directory. Test data is in the ` data/ ` directory. A couple of benchmarks use
55- big files that are not committed to this repo. So run ` ./data/setupdata.sh ` to download the required data files
56- so that you don't skip the big benchmarks. The benchmarks compare ` json_five ` (this crate) to
57- [ serde_json] ( https://github.com/serde-rs/json ) and [ json5-rs] ( https://github.com/callum-oakley/json5-rs ) .
58-
59- Notwithstanding the general caveats of benchmarks, in initial testing, ` json_five ` outperforms ` json5-rs ` .
60- In typical scenarios: 3-4x performance, it seems. At time of writing (pre- v0) no performance optimizations have been done. I
61- expect performance to improve, if at least marginally, in the future.
62-
63- These benchmarks were run on Windows on an i9-10900K. This table won't be updated unless significant changes happen.
6445
65- | test | json_five | serde_json | json5 |
66- | --------------------| ---------------| ---------------| ---------------|
67- | big (25MB) | 580.31 ms | 150.39 ms | 3.0861 s |
68- | medium-ascii (5MB) | 199.88 ms | 59.008 ms | 706.94 ms |
69- | empty | 228.62 ns | 38.786 ns | 708.00 ns |
70- | arrays | 578.24 ns | 100.95 ns | 1.3228 µs |
71- | objects | 922.91 ns | 205.75 ns | 2.0748 µs |
72- | nested-array | 22.990 µs | 5.0483 µs | 29.356 µs |
73- | nested-objects | 50.659 µs | 14.755 µs | 132.75 µs |
74- | string | 421.17 ns | 91.051 ns | 3.5691 µs |
75- | number | 238.75 ns | 36.179 ns | 779.13 ns |
76-
77-
78-
79- # Round-trip model
80-
81- The ` rt ` module contains the round-trip parser. This is intended to be ergonomic for round-trip use cases, although
82- it is still very possible to use the default parser (which is more performance-oriented) for certain round-trip use cases.
83- The round-trip AST model produced by the round-trip parser includes additional ` context ` fields that describe the whitespace, comments,
84- and (where applicable) trailing commas on each production. Moreover, unlike the default parser, the AST consists
85- entirely of owned types, allowing for simplified in-place editing.
86-
87-
88- The ` context ` field holds a single field struct that contains the field ` wsc ` (meaning 'white space and comments')
89- which holds a tuple of ` String ` s that represent the contextual whitespace and comments. The last element in
90- the ` wsc ` tuple in the ` context ` of ` JSONArrayValue ` and ` JSONKeyValuePair ` objects is an ` Option<String> ` -- which
91- is used as a marker to indicate an optional trailing comma and any whitespace that may follow that optional comma.
92-
93- The ` context ` field is always an ` Option ` .
94-
95- Contexts are associated with the following structs (which correspond to the JSON5 productions) and their context layout:
96-
97- ## ` rt::parser::JSONText `
98-
99- Represents the top-level Text production of a JSON5 document. It consists solely of a single (required) value.
100- It may have whitespace/comments before or after the value. The ` value ` field contains any ` JSONValue ` and the ` context `
101- field contains the context struct containing the ` wsc ` field, a two-length tuple that describes the whitespace before and after the value.
102- In other words: ` { wsc.0 } value { wsc.1 } `
46+ Serializing also works in the usual way. The re-exported ` to_string ` function comes from the ` ser ` module and works
47+ how you'd expect with default formatting.
10348
10449``` rust
105- use json_five :: rt :: parser :: from_str;
106- use json_five :: rt :: parser :: JSONValue ;
107-
108- let doc = from_str (" 'foo'\ n" ). unwrap ();
109- let context = doc . context. unwrap ();
110-
111- assert_eq! (& context . wsc. 0 , " " );
112- assert_eq! (doc . value, JSONValue :: SingleQuotedString (" foo" . to_string ()));
113- assert_eq! (& context . wsc. 1 , " \ n" );
50+ use serde :: Serialize ;
51+ use json_five :: to_string;
52+ #[derive(Serialize )]
53+ struct Test {
54+ int : u32 ,
55+ seq : Vec <& 'static str >,
56+ }
57+ let test = Test {
58+ int : 1 ,
59+ seq : vec! [" a" , " b" ],
60+ };
61+ let expected = r # " {"int": 1, "seq": ["a", "b"]}" # ;
62+ assert_eq! (to_string (& test ). unwrap (), expected );
11463```
11564
65+ You may also use the ` to_string_formatted ` with a ` FormatConfiguration ` to control the output format, including
66+ indentation, trailing commas, and key/item separators.
11667
117- ## ` rt::parser::JSONValue::JSONObject `
118-
119- Member of the ` rt::parser::JSONValue ` enum representing [ JSON5 objects] ( https://spec.json5.org/#objects ) .
120-
121- There are two fields: ` key_value_pairs ` , which is a ` Vec ` of ` JSONKeyValuePair ` s, and ` context ` whose ` wsc ` is
122- a one-length tuple containing the whitespace/comments that occur after the opening brace. In non-empty objects,
123- the whitespace that precedes the closing brace is part of the last item in the ` key_value_pairs ` Vec.
124- In other words: ` LBRACE { wsc.0 } [ key_value_pairs ] RBRACE `
125- and: ` .context.wsc: (String,) `
126-
127- ### ` rt::parser::KeyValuePair `
128-
129- The ` KeyValuePair ` struct represents the [ 'JSON5Member' production] ( https://spec.json5.org/#prod-JSON5Member ) .
130- It has three fields: ` key ` , ` value ` , and ` context ` . The ` key ` is a ` JSONValue ` , in practice limited to ` JSONValue::Identifier ` ,
131- ` JSONValue::DoubleQuotedString ` or a ` JSONValue::SingleQuotedString ` . The ` value ` is any ` JSONValue ` .
132-
133- Its context describes whitespace/comments that are between the key
134- and ` : ` , between the ` : ` and the value, after the value, and (optionally) a trailing comma and whitespace trailing the
135- comma.
136- In other words, roughly: ` key { wsc.0 } COLON { wsc.1 } value { wsc.2 } [ COMMA { wsc.3 } [ next_key_value_pair ] ] `
137- and: ` .context.wsc: (String, String, String, Option<String>) `
138-
139- When ` context.wsc.3 ` is ` Some() ` , it indicates the presence of a trailing comma (not included in the string) and
140- whitespace that follows the comma. This item MUST be ` Some() ` when it is not the last member in the object.
141-
142- ## ` rt::parser::JSONValue::JSONArray `
143-
144- Member of the ` rt::parser::JSONValue ` enum representing [ JSON5 arrays] ( https://spec.json5.org/#arrays ) .
145-
146- There are two fields on this struct: ` values ` , which is of type ` Vec<JSONArrayValue> ` , and ` context ` which holds
147- a one-length tuple containing the whitespace/comments that occur after the opening bracket. In non-empty arrays,
148- the whitespace that precedes the closing bracket is part of the last item in the ` values ` Vec.
149- In other words: ` LBRACKET { wsc.0 } [ values ] RBRACKET `
150- and: ` .context.wsc: (String,) `
151-
152-
153- ### ` rt::parser::JSONArrayValue `
154-
155- The ` JSONArrayValue ` struct represents a single member of a JSON5 Array. It has two fields: ` value ` , which is any
156- ` JSONValue ` , and ` context ` which contains the contextual whitespace/comments around the member. The ` context ` 's ` wsc `
157- field is a two-length tuple for the whitespace that may occur after the value and (optionally) after the comma following the value.
158- In other words, roughly: ` value { wsc.0 } [ COMMA { wsc.1 } [ next_value ]] `
159- and: ` .context.wsc: (String, Option<String>) `
160-
161- When ` context.wsc.1 ` is ` Some() ` it indicates the presence of the comma (not included in the string) and any whitespace
162- following the comma is contained in the string. This item MUST be ` Some() ` when it is not the last member of the array.
163-
164- ## Other ` rt::parser::JSONValue ` s
165-
68+ ``` rust
69+ use serde :: Serialize ;
70+ use json_five :: {to_string_formatted, FormatConfiguration , TrailingComma };
71+ #[derive(Serialize )]
72+ struct Test {
73+ int : u32 ,
74+ seq : Vec <& 'static str >,
75+ }
76+ let test = Test {
77+ int : 1 ,
78+ seq : vec! [" a" , " b" ],
79+ };
80+
81+ let config = FormatConfiguration :: with_indent (4 , TrailingComma :: ALL );
82+ let formatted_doc = to_string_formatted (& test , config ). unwrap ();
83+
84+ let expected = r # " {
85+ "int": 1,
86+ "seq": [
87+ "a",
88+ "b",
89+ ],
90+ }" # ;
91+
92+ assert_eq! (formatted_doc , expected );
93+ ```
16694
95+ ## Examples
16796
168- - ` JSONValue::Integer(String) `
169- - ` JSONValue::Float(String) `
170- - ` JSONValue::Exponent(String) `
171- - ` JSONValue::Null `
172- - ` JSONValue::Infinity `
173- - ` JSONValue::NaN `
174- - ` JSONValue::Hexadecimal(String) `
175- - ` JSONValue::Bool(bool) `
176- - ` JSONValue::DoubleQuotedString(String) `
177- - ` JSONValue::SingleQuotedString(String) `
178- - ` JSONValue::Unary { operator: UnaryOperator, value: Box<JSONValue> } `
179- - ` JSONValue::Identifier(String) ` (for object keys only!).
97+ See the ` examples/ ` directory for examples of programs that utilize round-tripping features.
18098
181- Where these enum members have ` String ` s, they represent the object as it was tokenized without any modifications (that
182- is, for example, without any escape sequences un-escaped). The single- and double-quoted ` String ` s do not include the surrounding
183- quote characters. These members alone have no ` context ` .
99+ - ` examples/json5-doublequote-fixer ` gives an example of tokenization-based round-tripping edits
100+ - ` examples/json5-trailing-comma-formatter ` gives an example of model-based round-tripping edits
184101
185- # round-trip tokenizer
186102
187- The ` rt::tokenizer ` module contains some useful tools for round-tripping tokens. The ` Token ` s produced by the
188- rt tokenizer are owned types containing the lexeme from the source. There are two key functions in the tokenizer module:
103+ # Benchmarking
189104
190- - ` rt::tokenize::source_to_tokens `
191- - ` rt::tokenize::tokens_to_source `
105+ Benchmarks are available in the ` benches/ ` directory. Test data is in the ` data/ ` directory. A couple of benchmarks use
106+ big files that are not committed to this repo. So run ` ./data/setupdata.sh ` to download the required data files
107+ so that you don't skip the big benchmarks. The benchmarks compare ` json_five ` (this crate) to
108+ [ serde_json] ( https://github.com/serde-rs/json ) and [ json5-rs] ( https://github.com/callum-oakley/json5-rs ) .
192109
193- Each ` Token ` generated from ` source_to_tokens ` also contains some contextual information, such as line/col numbers, offsets, etc.
194- This contextual information is not required for ` tokens_to_source ` -- that is: you can create new tokens and insert them
195- into your tokens array and process those tokens back to JSON5 source without issue.
110+ Notwithstanding the general caveats of benchmarks, in initial testing, ` json_five ` definitively outperforms ` json5-rs ` .
111+ In typical scenarios observations have been 3-4x performance, and up to 20x faster in some synthetic tests on extremely large data.
112+ At time of writing (pre- v0) no performance optimizations have been done. I expect performance to improve,
113+ if at least marginally, in the future.
114+
115+ These benchmarks were run on Windows on an i9-10900K with rustc 1.83.0 (90b35a623 2024-11-26). This table won't be updated unless significant changes happen.
116+
117+
118+ | test | json_five | json5 | serde_json |
119+ | ----------------------------| -----------| -----------| ------------|
120+ | big (25MB) | 580.31 ms | 3.0861 s | 150.39 ms |
121+ | medium-ascii (5MB) | 199.88 ms | 706.94 ms | 59.008 ms |
122+ | empty | 228.62 ns | 708.00 ns | 38.786 ns |
123+ | arrays | 578.24 ns | 1.3228 µs | 100.95 ns |
124+ | objects | 922.91 ns | 2.0748 µs | 205.75 ns |
125+ | nested-array | 22.990 µs | 29.356 µs | 5.0483 µs |
126+ | nested-objects | 50.659 µs | 132.75 µs | 14.755 µs |
127+ | string | 421.17 ns | 3.5691 µs | 91.051 ns |
128+ | number | 238.75 ns | 779.13 ns | 36.179 ns |
129+ | deserialize (size 10) | 6.9898µs | 58.398µs | 886.33ns |
130+ | deserialize (size 10) | 6.9898µs | 58.398µs | 886.33ns |
131+ | deserialize (size 10) | 6.9898µs | 58.398µs | 886.33ns |
132+ | deserialize (size 100) | 66.005µs | 830.79µs | 9.9705µs |
133+ | deserialize (size 100) | 66.005µs | 830.79µs | 9.9705µs |
134+ | deserialize (size 100) | 66.005µs | 830.79µs | 9.9705µs |
135+ | deserialize (size 1000) | 599.39µs | 8.4952ms | 69.110µs |
136+ | deserialize (size 1000) | 599.39µs | 8.4952ms | 69.110µs |
137+ | deserialize (size 1000) | 599.39µs | 8.4952ms | 69.110µs |
138+ | deserialize (size 10000) | 5.9841ms | 82.591ms | 734.40µs |
139+ | deserialize (size 10000) | 5.9841ms | 82.591ms | 734.40µs |
140+ | deserialize (size 10000) | 5.9841ms | 82.591ms | 734.40µs |
141+ | deserialize (size 100000) | 66.841ms | 955.37ms | 11.638ms |
142+ | deserialize (size 100000) | 66.841ms | 955.37ms | 11.638ms |
143+ | deserialize (size 100000) | 66.841ms | 955.37ms | 11.638ms |
144+ | deserialize (size 1000000) | 674.13ms | 9.5758s | 119.03ms |
145+ | deserialize (size 1000000) | 674.13ms | 9.5758s | 119.03ms |
146+ | deserialize (size 1000000) | 674.13ms | 9.5758s | 119.03ms |
147+ | serialize (size 10) | 2.3496µs | 48.915µs | 891.85ns |
148+ | serialize (size 10) | 2.3496µs | 48.915µs | 891.85ns |
149+ | serialize (size 10) | 2.3496µs | 48.915µs | 891.85ns |
150+ | serialize (size 100) | 19.602µs | 458.98µs | 6.7109µs |
151+ | serialize (size 100) | 19.602µs | 458.98µs | 6.7109µs |
152+ | serialize (size 100) | 19.602µs | 458.98µs | 6.7109µs |
153+ | serialize (size 1000) | 194.19µs | 4.6035ms | 62.667µs |
154+ | serialize (size 1000) | 194.19µs | 4.6035ms | 62.667µs |
155+ | serialize (size 1000) | 194.19µs | 4.6035ms | 62.667µs |
156+ | serialize (size 10000) | 2.2104ms | 47.253ms | 761.10µs |
157+ | serialize (size 10000) | 2.2104ms | 47.253ms | 761.10µs |
158+ | serialize (size 10000) | 2.2104ms | 47.253ms | 761.10µs |
159+ | serialize (size 100000) | 24.418ms | 502.35ms | 11.410ms |
160+ | serialize (size 100000) | 24.418ms | 502.35ms | 11.410ms |
161+ | serialize (size 100000) | 24.418ms | 502.35ms | 11.410ms |
162+ | serialize (size 1000000) | 245.26ms | 4.6211s | 115.84ms |
163+ | serialize (size 1000000) | 245.26ms | 4.6211s | 115.84ms |
164+ | serialize (size 1000000) | 245.26ms | 4.6211s | 115.84ms |
196165
197- The ` tok_type ` attribute leverages the same ` json_five::tokenize::TokType ` types. Those are:
198166
199- - ` LeftBrace `
200- - ` RightBrace `
201- - ` LeftBracket `
202- - ` RightBracket `
203- - ` Comma `
204- - ` Colon `
205- - ` Name ` (Identifiers)
206- - ` SingleQuotedString `
207- - ` DoubleQuotedString `
208- - ` BlockComment `
209- - ` LineComment ` note: the lexeme includes the singular trailing newline, if present (e.g., not a comment just before EOF with no newline at end of file)
210- - ` Whitespace `
211- - ` True `
212- - ` False `
213- - ` Null `
214- - ` Integer `
215- - ` Float `
216- - ` Infinity `
217- - ` Nan `
218- - ` Exponent `
219- - ` Hexadecimal `
220- - ` Plus `
221- - ` Minus `
222- - ` EOF `
223-
224- Note: string tokens will include surrounding quotes.
225167
226168
227169# Notes
0 commit comments