Skip to content

Add feature to add spans tracking source positions in AST nodes #373

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

Ertanic
Copy link

@Ertanic Ertanic commented Nov 16, 2024

I needed a parser for .ftl files. I found tree-sitter-fluent, but for some reason it couldn't parse a valid file, throwing errors when trying to use replaceable expressions. Decided to use fluent-syntax, but why does the javascript version have node spans but the rust version does not. This PR solves this issue, but since there is usually no need for spans, I hid them behind the spans feature.

And also to avoid conflicts in tests, because there the tree is formatted, because of which the spans change, the implementation version of PartialEq for AST nodes was divided into a derive implementation and a manual one.

The good idea is to write tests to match the spans, but I'm not sure how best to do that, I need help with this.

@alerque
Copy link
Collaborator

alerque commented Nov 16, 2024

This would address #270, no?

Have you run any benchmarks with/without this feature enabled?

@Ertanic
Copy link
Author

Ertanic commented Nov 16, 2024

This would address #270, no?

I've been looking in only open PRs.

Have you run any benchmarks with/without this feature enabled?

bench default spans feature
construct/preferences 26.619 µs 24.673 µs
resolve/preferences 15.104 µs 14.990 µs
resolve_to_str/preferences 22.803 µs 22.902 µs
parse_ctx_runtime/preferences 300.13 µs 305.19 µs
parse_ctx_runtime/browser 120.29 µs 145.79 µs

Copy link
Collaborator

@zbraniecki zbraniecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all in all, happy to see that! I definitely was hoping we'll gain this feature one day. Thank you for contributing!

My only uber-concern is that you use PartialEq to compare elements with different span. I'm not sure how canonical it is to use PartialEq this way. We wrangled with the meaning and role of PartialEq in ICU4X for a long time and I'm still not sure how to handle it but in ICU4X we decided to stay on the cautious side and introduce a function like cmp_value to allow for comparisons that exclude part of the value.
Span can be thought of as a metadata or part of the element and with the trait use you make it impossible to compare with spans.

I'm not sure what's the right way around it and if canonically ASTs in Rust (or other languages) use, and it is not uncommon to compare skipping spans, I'm fine with doing the same.

@Ertanic
Copy link
Author

Ertanic commented Nov 17, 2024

I wouldn't want to use a manual implementation of PartialEq, but I've found this to be the most optimal solution. I found several solutions on the Internet, including the derivative crate, where it is possible to ignore a particular field when using #[derive(Derivative), derivative(PartialEq)], but I didn't want to drag additional dependencies for the sake of it. Although it is much more convenient, because when you change the structure's composition, you won't have to worry about supporting manual implementation of PartialEq.

Span itself implements PartialEq so that it can be compared to others. I was looking at tree-sitter, where the range of a node is provided through the corresponding function.

And I don't quite understand your point. Are you proposing to introduce additional methods for fields to compare structures and their fields? Or to compare all fields of node structures separately from PartialEq and Eq in separate methods?

Again, it's all for the sake of passing some tests that receive one ftl as input, then serialize it into the formatted ftl format and parse it again, so you get different spans for nodes. Either change the input data of the tests, which I think is wrong, or supplement the serializer so that it builds ftl content by spans, or just separate the implementation of comparison. I don't know, I chose the easiest option, as I needed it urgently in my lsp server, and I don't have any problems with it so far.

@zbraniecki
Copy link
Collaborator

zbraniecki commented Nov 18, 2024

And I don't quite understand your point. Are you proposing to introduce additional methods for fields to compare structures and their fields? Or to compare all fields of node structures separately from PartialEq and Eq in separate methods?

I'm raising a concern that semantically the following code should pass:

let node1 = Node {
  value: "foo",
  span: span!(0, 4),
};

let node2 = Node {
  value: "foo",
  span: span!(5, 11),
};
assert_ne!(node1, node2);

because those two nodes are not equal. Their content is different.

Now, what is true is that in most cases we care about the actual content of the node, not its meta information. We can explicitly achieve that by doing:

let node1 = Node {
  value: "foo",
  span: span!(0, 4),
};

let node2 = Node {
  value: "foo",
  span: span!(5, 11),
};
assert_ne!(node1, node2);

// Option 1:
assert_eq!(node1.content, node2.content);

// Option 2:
assert_eq!(node1.cmp_content(&node2));

Or we can do what is proposed in the PR and add:

let node1 = Node {
  value: "foo",
  span: span!(0, 4),
};

let node2 = Node {
  value: "foo",
  span: span!(5, 11),
};
assert_eq!(node1, node2);

assert_ne!(node1.span, node2.span);
assert_eq!(node1.cmp_span(&node2));

I'm not sure what is the most common approach to AST comparisons with spans. I'd suggest checking prior art in other parser/AST/serializer models.

@Ertanic
Copy link
Author

Ertanic commented Jan 25, 2025

I apologize for missing, been a bit busy. You're right that spans are part of the object metadata, so I removed the weird manual implementation of the comparison and just pre-formatted the actual version with the correct spans that are compared in the test.

@Ertanic Ertanic requested a review from zbraniecki February 2, 2025 20:22
@alerque alerque self-requested a review February 17, 2025 12:49
@Ertanic
Copy link
Author

Ertanic commented Mar 8, 2025

Okay...

Copy link
Collaborator

@zbraniecki zbraniecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alerque feel free to decide if you want to merge or agree that we should use Range here.
The rest looks good to me! Thank you @Ertanic!

Copy link
Collaborator

@zbraniecki zbraniecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, except that I wouldn't suggest enabling spans by default - the default use is to parse for runtime use of the AST, not for editor cycle.

I also like the wrapping of Range because it allows us to swap for the new Range type once it stabilizes and enable Copy - https://hackmd.io/@uhs6rVdLTSS0gnie4q0fqA/ryjYJW2pa

@Ertanic
Copy link
Author

Ertanic commented Mar 11, 2025

I'm aware of the feature, I fixed it a long time ago, but the commit hung
image

@Ertanic
Copy link
Author

Ertanic commented Mar 12, 2025

I don't know why github can't output my commit, but fork has it: 460c365

@Ertanic
Copy link
Author

Ertanic commented Mar 17, 2025

Had to play around with reverting to fix the github. @zbraniecki

@alerque alerque self-assigned this May 20, 2025
@alerque
Copy link
Collaborator

alerque commented May 20, 2025

Is there a Git attribute we need to set to make checking out the new fixtures that explicitly test carriage returns from getting smudged on checkout?

@Ertanic
Copy link
Author

Ertanic commented May 21, 2025

I couldn't think of anything better than just disabling the documentation tests. Otherwise it's too much work to modify the documentation, which I think will only make it harder to read. From the suggestions to create a separate task for testing documentation, skipping the spans feature.

@alerque alerque force-pushed the span branch 2 times, most recently from e3722f0 to bbab212 Compare May 21, 2025 09:02
@alerque alerque added enhancement crate:fluent-syntax Issues related to fluent-syntax crate crate:fluent-bundle Issues related to fluent-bundle crate labels May 21, 2025
@@ -63,3 +64,6 @@ required-features = ["json"]
name = "parser_fixtures"
path = "tests/parser_fixtures.rs"
required-features = ["json"]

[lib]
doctest = false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ertanic Why do you still think this exclusion is necessary given that spans is no longer a default feature?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the test task is started with the --all-features flag.

@alerque alerque linked an issue May 21, 2025 that may be closed by this pull request
@alerque alerque changed the title Added spans to AST nodes Add feature to add spans tracking source positions in AST nodes May 21, 2025
@Ertanic
Copy link
Author

Ertanic commented May 21, 2025

I've tried fixing the error during the parse_fixtures_compare test, even adding spans to the json files, but it still keeps generating the error of no span field in the json file.

@alerque
Copy link
Collaborator

alerque commented May 21, 2025

Please don't do any merging into this branch. I've force pushed several times to clean up the history and you keep pushing the dirty commits back into the timeline. Do feel free to work on it if there is something to do, but before committing either git pull --rebase to work from where this branch is (or git reset --hard <COMMIT> to the lastest commit you see here) and then just commit the changes on top of that. I can selectively squash fixes into the appropriate previous commits much easier than unraveling the cruft that comes in when you merge from your own outdated history. Thanks.

@alerque
Copy link
Collaborator

alerque commented May 21, 2025

Somehow the new fixture layout for un-normalized tests is running afoul of this heuristic:

https://github.com/projectfluent/fluent-rs/blob/main/fluent-syntax/src/serializer.rs#L425-L429

I can actually pass tests by not adding the extra carriage return there, but I also don't see a test for the "rare edge case" that is supposed to fix in the first place, so just bypassing it doesn't seem like a good plan.

Comment on lines 53 to 68
let formatted_path = path.parent().unwrap().join("formatted");

let formatted_content = fs::read_to_string(
formatted_path
.join("junked")
.join(path.file_name().unwrap()),
)
.unwrap();
assert_eq!(formatted_content, reserialized);
let formatted = parse(formatted_content.as_str()).unwrap_or_else(|(res, _)| res);

let formatted_content_without_junk =
fs::read_to_string(formatted_path.join(path.file_name().unwrap())).unwrap();
assert_eq!(formatted_content_without_junk, reserialized_without_junk);
let formatted_without_junk =
parse(formatted_content_without_junk.as_str()).unwrap_or_else(|(res, _)| res);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I fiddle with it trying to figure out what's going on with tests the more I think this approach to having expanded files for just the serialized message is not the right approach to testing. We should be able to test the original JSON fixtures and just use a query engine of some kind to extract the message bit and work around having serialized position data in our actual results.

With all these expanded and multiple entries in different formats in becomes much harder to audit what tests are actually doing and compare with other implementations.

Copy link
Collaborator

@alerque alerque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anywhere here the span results are actually being tested? Mostly I see the tests modified to try to dodge the span info, but we could use at least a couple inputs with the expected position data tested.

See my other comment about the pre-formated test results. I'm a bit skeptical about that being the most maintainable solution, especially since we can't even work out the line ending handling on the file reads. Having the expectation in JSON (as much as I actually despise JSON) does have the advantage of being unambiguous about what the EOL/EOF business is. How would those be updated? Can we backtrack and just use the JSON expectations and figure out how to query the relevant bits we need out of the upstream fixture expectations vs. our serialized data?

…dtrip_unnormalized_fixtures()`"

This reverts commit 0756bd1.
@alerque
Copy link
Collaborator

alerque commented May 22, 2025

I was pretty excited to get this feature landed before cutting a release, but I think I'm going to have to delay it. Hopefully we don't have to delay it long—I have no objection to doing a new release on a relatively short timeline as soon as this is actually ready.

My concerns are mostly related to testing, but I was playing around with fixing the tests and it feels like something is wrong with the scruct itself. Enabling the feature (or not) should not immediately break all existing usage of the serialization. That would make it very hard for existing systems with serialized data to migrate. If possible I would really like existing apps not to break the moment somebody tries enabling the feature. That may not be quite possible, but we should at least think through what the ramifications are and document how/why a change needs to be made to keep using existing code easily on the new version.

Until that gets thought through and tests pass both with and without the feature enabled I can't reasonably merge this, and since we have problems with old dependencies starting to hinder usage I kind of need to get an update out to address that.

Again I do want this feature to land and am willing to facilitate a release cycle special for it as soon as we're actually comfortable with the upgrade path if there are any breaking changes and tests work well enough to rely on.

@Ertanic
Copy link
Author

Ertanic commented May 23, 2025

I do not know how to fix the roundtrip_unnormalized_fixtures test, so I suggest hiding it for now behind the cfg(not(feature = "spans")) attribute?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
crate:fluent-bundle Issues related to fluent-bundle crate crate:fluent-syntax Issues related to fluent-syntax crate enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose spans in parser / source positions in the AST
3 participants