Add feature to add spans tracking source positions in AST nodes #373

Ertanic · 2024-11-16T16:41:51Z

I needed a parser for .ftl files. I found tree-sitter-fluent, but for some reason it couldn't parse a valid file, throwing errors when trying to use replaceable expressions. Decided to use fluent-syntax, but why does the javascript version have node spans but the rust version does not. This PR solves this issue, but since there is usually no need for spans, I hid them behind the spans feature.

And also to avoid conflicts in tests, because there the tree is formatted, because of which the spans change, the implementation version of PartialEq for AST nodes was divided into a derive implementation and a manual one.

The good idea is to write tests to match the spans, but I'm not sure how best to do that, I need help with this.

alerque · 2024-11-16T19:03:04Z

This would address #270, no?

Have you run any benchmarks with/without this feature enabled?

Ertanic · 2024-11-16T19:29:45Z

This would address #270, no?

I've been looking in only open PRs.

Have you run any benchmarks with/without this feature enabled?

bench	default	`spans` feature
construct/preferences	26.619 µs	24.673 µs
resolve/preferences	15.104 µs	14.990 µs
resolve_to_str/preferences	22.803 µs	22.902 µs
parse_ctx_runtime/preferences	300.13 µs	305.19 µs
parse_ctx_runtime/browser	120.29 µs	145.79 µs

zbraniecki

all in all, happy to see that! I definitely was hoping we'll gain this feature one day. Thank you for contributing!

My only uber-concern is that you use PartialEq to compare elements with different span. I'm not sure how canonical it is to use PartialEq this way. We wrangled with the meaning and role of PartialEq in ICU4X for a long time and I'm still not sure how to handle it but in ICU4X we decided to stay on the cautious side and introduce a function like cmp_value to allow for comparisons that exclude part of the value.
Span can be thought of as a metadata or part of the element and with the trait use you make it impossible to compare with spans.

I'm not sure what's the right way around it and if canonically ASTs in Rust (or other languages) use, and it is not uncommon to compare skipping spans, I'm fine with doing the same.

fluent-syntax/src/serializer.rs

Ertanic · 2024-11-17T17:47:24Z

I wouldn't want to use a manual implementation of PartialEq, but I've found this to be the most optimal solution. I found several solutions on the Internet, including the derivative crate, where it is possible to ignore a particular field when using #[derive(Derivative), derivative(PartialEq)], but I didn't want to drag additional dependencies for the sake of it. Although it is much more convenient, because when you change the structure's composition, you won't have to worry about supporting manual implementation of PartialEq.

Span itself implements PartialEq so that it can be compared to others. I was looking at tree-sitter, where the range of a node is provided through the corresponding function.

And I don't quite understand your point. Are you proposing to introduce additional methods for fields to compare structures and their fields? Or to compare all fields of node structures separately from PartialEq and Eq in separate methods?

Again, it's all for the sake of passing some tests that receive one ftl as input, then serialize it into the formatted ftl format and parse it again, so you get different spans for nodes. Either change the input data of the tests, which I think is wrong, or supplement the serializer so that it builds ftl content by spans, or just separate the implementation of comparison. I don't know, I chose the easiest option, as I needed it urgently in my lsp server, and I don't have any problems with it so far.

zbraniecki · 2024-11-18T06:35:20Z

And I don't quite understand your point. Are you proposing to introduce additional methods for fields to compare structures and their fields? Or to compare all fields of node structures separately from PartialEq and Eq in separate methods?

I'm raising a concern that semantically the following code should pass:

let node1 = Node {
  value: "foo",
  span: span!(0, 4),
};

let node2 = Node {
  value: "foo",
  span: span!(5, 11),
};
assert_ne!(node1, node2);

because those two nodes are not equal. Their content is different.

Now, what is true is that in most cases we care about the actual content of the node, not its meta information. We can explicitly achieve that by doing:

let node1 = Node {
  value: "foo",
  span: span!(0, 4),
};

let node2 = Node {
  value: "foo",
  span: span!(5, 11),
};
assert_ne!(node1, node2);

// Option 1:
assert_eq!(node1.content, node2.content);

// Option 2:
assert_eq!(node1.cmp_content(&node2));

Or we can do what is proposed in the PR and add:

let node1 = Node {
  value: "foo",
  span: span!(0, 4),
};

let node2 = Node {
  value: "foo",
  span: span!(5, 11),
};
assert_eq!(node1, node2);

assert_ne!(node1.span, node2.span);
assert_eq!(node1.cmp_span(&node2));

I'm not sure what is the most common approach to AST comparisons with spans. I'd suggest checking prior art in other parser/AST/serializer models.

Ertanic · 2025-01-25T13:50:22Z

I apologize for missing, been a bit busy. You're right that spans are part of the object metadata, so I removed the weird manual implementation of the comparison and just pre-formatted the actual version with the correct spans that are compared in the test.

Ertanic · 2025-03-08T12:30:41Z

Okay...

zbraniecki

@alerque feel free to decide if you want to merge or agree that we should use Range here.
The rest looks good to me! Thank you @Ertanic!

fluent-syntax/src/ast/mod.rs

zbraniecki

This looks good to me, except that I wouldn't suggest enabling spans by default - the default use is to parse for runtime use of the AST, not for editor cycle.

I also like the wrapping of Range because it allows us to swap for the new Range type once it stabilizes and enable Copy - https://hackmd.io/@uhs6rVdLTSS0gnie4q0fqA/ryjYJW2pa

fluent-syntax/Cargo.toml

Ertanic · 2025-03-11T18:58:46Z

I'm aware of the feature, I fixed it a long time ago, but the commit hung

Ertanic · 2025-03-12T07:32:45Z

I don't know why github can't output my commit, but fork has it: 460c365

Ertanic · 2025-03-17T10:32:18Z

Had to play around with reverting to fix the github. @zbraniecki

alerque · 2025-05-20T23:10:58Z

Is there a Git attribute we need to set to make checking out the new fixtures that explicitly test carriage returns from getting smudged on checkout?

Ertanic · 2025-05-21T07:55:45Z

I couldn't think of anything better than just disabling the documentation tests. Otherwise it's too much work to modify the documentation, which I think will only make it harder to read. From the suggestions to create a separate task for testing documentation, skipping the spans feature.

…odes

alerque · 2025-05-21T09:46:58Z

fluent-syntax/Cargo.toml

@@ -63,3 +64,6 @@ required-features = ["json"]
 name = "parser_fixtures"
 path = "tests/parser_fixtures.rs"
 required-features = ["json"]
+
+[lib]
+doctest = false


@Ertanic Why do you still think this exclusion is necessary given that spans is no longer a default feature?

Because the test task is started with the --all-features flag.

fluent-syntax/src/ast/helper.rs

Ertanic · 2025-05-21T10:35:50Z

I've tried fixing the error during the parse_fixtures_compare test, even adding spans to the json files, but it still keeps generating the error of no span field in the json file.

alerque · 2025-05-21T13:24:07Z

Please don't do any merging into this branch. I've force pushed several times to clean up the history and you keep pushing the dirty commits back into the timeline. Do feel free to work on it if there is something to do, but before committing either git pull --rebase to work from where this branch is (or git reset --hard <COMMIT> to the lastest commit you see here) and then just commit the changes on top of that. I can selectively squash fixes into the appropriate previous commits much easier than unraveling the cruft that comes in when you merge from your own outdated history. Thanks.

…normalized_fixtures()` ...and compare with expected span data.

…e<usize>`

alerque · 2025-05-21T15:05:23Z

Somehow the new fixture layout for un-normalized tests is running afoul of this heuristic:

https://github.com/projectfluent/fluent-rs/blob/main/fluent-syntax/src/serializer.rs#L425-L429

I can actually pass tests by not adding the extra carriage return there, but I also don't see a test for the "rare edge case" that is supposed to fix in the first place, so just bypassing it doesn't seem like a good plan.

alerque · 2025-05-21T21:53:01Z

fluent-syntax/tests/serializer_fixtures.rs

+        let formatted_path = path.parent().unwrap().join("formatted");
+
+        let formatted_content = fs::read_to_string(
+            formatted_path
+                .join("junked")
+                .join(path.file_name().unwrap()),
+        )
+        .unwrap();
+        assert_eq!(formatted_content, reserialized);
+        let formatted = parse(formatted_content.as_str()).unwrap_or_else(|(res, _)| res);
+
+        let formatted_content_without_junk =
+            fs::read_to_string(formatted_path.join(path.file_name().unwrap())).unwrap();
+        assert_eq!(formatted_content_without_junk, reserialized_without_junk);
+        let formatted_without_junk =
+            parse(formatted_content_without_junk.as_str()).unwrap_or_else(|(res, _)| res);


The more I fiddle with it trying to figure out what's going on with tests the more I think this approach to having expanded files for just the serialized message is not the right approach to testing. We should be able to test the original JSON fixtures and just use a query engine of some kind to extract the message bit and work around having serialized position data in our actual results.

With all these expanded and multiple entries in different formats in becomes much harder to audit what tests are actually doing and compare with other implementations.

alerque

Is there anywhere here the span results are actually being tested? Mostly I see the tests modified to try to dodge the span info, but we could use at least a couple inputs with the expected position data tested.

See my other comment about the pre-formated test results. I'm a bit skeptical about that being the most maintainable solution, especially since we can't even work out the line ending handling on the file reads. Having the expectation in JSON (as much as I actually despise JSON) does have the advantage of being unambiguous about what the EOL/EOF business is. How would those be updated? Can we backtrack and just use the JSON expectations and figure out how to query the relevant bits we need out of the upstream fixture expectations vs. our serialized data?

…dtrip_unnormalized_fixtures()`" This reverts commit 0756bd1.

alerque · 2025-05-22T11:41:07Z

I was pretty excited to get this feature landed before cutting a release, but I think I'm going to have to delay it. Hopefully we don't have to delay it long—I have no objection to doing a new release on a relatively short timeline as soon as this is actually ready.

My concerns are mostly related to testing, but I was playing around with fixing the tests and it feels like something is wrong with the scruct itself. Enabling the feature (or not) should not immediately break all existing usage of the serialization. That would make it very hard for existing systems with serialized data to migrate. If possible I would really like existing apps not to break the moment somebody tries enabling the feature. That may not be quite possible, but we should at least think through what the ramifications are and document how/why a change needs to be made to keep using existing code easily on the new version.

Until that gets thought through and tests pass both with and without the feature enabled I can't reasonably merge this, and since we have problems with old dependencies starting to hinder usage I kind of need to get an update out to address that.

Again I do want this feature to land and am willing to facilitate a release cycle special for it as soon as we're actually comfortable with the upgrade path if there are any breaking changes and tests work well enough to rely on.

…y trait

Ertanic · 2025-05-23T06:53:41Z

I do not know how to fix the roundtrip_unnormalized_fixtures test, so I suggest hiding it for now behind the cfg(not(feature = "spans")) attribute?

zbraniecki reviewed Nov 17, 2024

View reviewed changes

Ertanic requested a review from zbraniecki February 2, 2025 20:22

alerque self-requested a review February 17, 2025 12:49

zbraniecki reviewed Mar 8, 2025

View reviewed changes

fluent-syntax/src/ast/mod.rs Outdated Show resolved Hide resolved

zbraniecki reviewed Mar 11, 2025

View reviewed changes

fluent-syntax/Cargo.toml Outdated Show resolved Hide resolved

alerque self-assigned this May 20, 2025

alerque force-pushed the span branch 2 times, most recently from e3722f0 to bbab212 Compare May 21, 2025 09:02

alerque added enhancement crate:fluent-syntax Issues related to fluent-syntax crate crate:fluent-bundle Issues related to fluent-bundle crate labels May 21, 2025

feat(fluent-bundle,fluent-syntax): Add source location spans to AST n…

c4fa6ad

…odes

alerque force-pushed the span branch from bbab212 to 4869050 Compare May 21, 2025 09:41

alerque reviewed May 21, 2025

View reviewed changes

fluent-syntax/src/ast/helper.rs Outdated Show resolved Hide resolved

alerque force-pushed the span branch from 4869050 to 33d2d38 Compare May 21, 2025 09:56

alerque linked an issue May 21, 2025 that may be closed by this pull request

Expose spans in parser / source positions in the AST #270

Open

alerque changed the title ~~Added spans to AST nodes~~ Add feature to add spans tracking source positions in AST nodes May 21, 2025

alerque mentioned this pull request May 21, 2025

Bump deps and prep for release #382

Merged

alerque mentioned this pull request May 21, 2025

Release all affected crates in workspace #383

Merged

alerque force-pushed the span branch from 9e0a324 to 391cc9c Compare May 21, 2025 13:25

Ertanic and others added 7 commits May 21, 2025 16:42

refactor(fluent-syntax): Derive PartialEq, drop manual implementation

57df84d

refactor(fluent-syntax): Use pre-formatted messages for `roundtrip_un…

0756bd1

…normalized_fixtures()` ...and compare with expected span data.

Merge branch 'projectfluent:main' into span

ed45428

refactor(fluent-syntax): Make Span a wrapper around `std::ops::Rang…

1d41394

…e<usize>`

test(fluent-syntax): Update test handling to support spans

7346a0f

Merge branch 'master' into span

fc263b7

docs: Add changelog entries covering spans PR

61eadef

alerque force-pushed the span branch from a9e595b to 61eadef Compare May 21, 2025 13:43

alerque reviewed May 21, 2025

View reviewed changes

alerque requested changes May 21, 2025

View reviewed changes

Revert "refactor(fluent-syntax): Use pre-formatted messages for `roun…

f73665a

…dtrip_unnormalized_fixtures()`" This reverts commit 0756bd1.

alerque and others added 4 commits May 22, 2025 14:43

fixup! wrap Range...use clone to keep serializing working without Cop…

1a16d2c

…y trait

fixup! Create generic ftl serializer (projectfluent#241)

c7c120a

Fix parse_fixtures_compare test

cf2c427

Fix parse_bench_fixtures test

adae4a0

Add feature to add spans tracking source positions in AST nodes #373

Are you sure you want to change the base?

Add feature to add spans tracking source positions in AST nodes #373

Uh oh!

Conversation

Ertanic commented Nov 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alerque commented Nov 16, 2024

Uh oh!

Ertanic commented Nov 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zbraniecki left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Ertanic commented Nov 17, 2024

Uh oh!

zbraniecki commented Nov 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ertanic commented Jan 25, 2025

Uh oh!

Ertanic commented Mar 8, 2025

Uh oh!

zbraniecki left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zbraniecki left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Ertanic commented Mar 11, 2025

Uh oh!

Ertanic commented Mar 12, 2025

Uh oh!

Ertanic commented Mar 17, 2025

Uh oh!

alerque commented May 20, 2025

Uh oh!

Ertanic commented May 21, 2025

Uh oh!

alerque May 21, 2025

Choose a reason for hiding this comment

Uh oh!

Ertanic May 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Ertanic commented May 21, 2025

Uh oh!

alerque commented May 21, 2025

Uh oh!

alerque commented May 21, 2025

Uh oh!

alerque May 21, 2025

Choose a reason for hiding this comment

Uh oh!

alerque left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alerque commented May 22, 2025

Uh oh!

Ertanic commented May 23, 2025

Uh oh!

Uh oh!

Ertanic commented Nov 16, 2024 •

edited

Loading

Ertanic commented Nov 16, 2024 •

edited

Loading

zbraniecki commented Nov 18, 2024 •

edited

Loading

alerque left a comment •

edited

Loading