-
Notifications
You must be signed in to change notification settings - Fork 100
Add feature to add spans tracking source positions in AST nodes #373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This would address #270, no? Have you run any benchmarks with/without this feature enabled? |
I've been looking in only open PRs.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all in all, happy to see that! I definitely was hoping we'll gain this feature one day. Thank you for contributing!
My only uber-concern is that you use PartialEq to compare elements with different span. I'm not sure how canonical it is to use PartialEq this way. We wrangled with the meaning and role of PartialEq in ICU4X for a long time and I'm still not sure how to handle it but in ICU4X we decided to stay on the cautious side and introduce a function like cmp_value
to allow for comparisons that exclude part of the value.
Span can be thought of as a metadata or part of the element and with the trait use you make it impossible to compare with spans.
I'm not sure what's the right way around it and if canonically ASTs in Rust (or other languages) use, and it is not uncommon to compare skipping spans, I'm fine with doing the same.
I wouldn't want to use a manual implementation of PartialEq, but I've found this to be the most optimal solution. I found several solutions on the Internet, including the Span itself implements PartialEq so that it can be compared to others. I was looking at tree-sitter, where the range of a node is provided through the corresponding function. And I don't quite understand your point. Are you proposing to introduce additional methods for fields to compare structures and their fields? Or to compare all fields of node structures separately from PartialEq and Eq in separate methods? Again, it's all for the sake of passing some tests that receive one ftl as input, then serialize it into the formatted ftl format and parse it again, so you get different spans for nodes. Either change the input data of the tests, which I think is wrong, or supplement the serializer so that it builds ftl content by spans, or just separate the implementation of comparison. I don't know, I chose the easiest option, as I needed it urgently in my lsp server, and I don't have any problems with it so far. |
I'm raising a concern that semantically the following code should pass: let node1 = Node {
value: "foo",
span: span!(0, 4),
};
let node2 = Node {
value: "foo",
span: span!(5, 11),
};
assert_ne!(node1, node2); because those two nodes are not equal. Their content is different. Now, what is true is that in most cases we care about the actual content of the node, not its meta information. We can explicitly achieve that by doing: let node1 = Node {
value: "foo",
span: span!(0, 4),
};
let node2 = Node {
value: "foo",
span: span!(5, 11),
};
assert_ne!(node1, node2);
// Option 1:
assert_eq!(node1.content, node2.content);
// Option 2:
assert_eq!(node1.cmp_content(&node2)); Or we can do what is proposed in the PR and add: let node1 = Node {
value: "foo",
span: span!(0, 4),
};
let node2 = Node {
value: "foo",
span: span!(5, 11),
};
assert_eq!(node1, node2);
assert_ne!(node1.span, node2.span);
assert_eq!(node1.cmp_span(&node2)); I'm not sure what is the most common approach to AST comparisons with spans. I'd suggest checking prior art in other parser/AST/serializer models. |
I apologize for missing, been a bit busy. You're right that spans are part of the object metadata, so I removed the weird manual implementation of the comparison and just pre-formatted the actual version with the correct spans that are compared in the test. |
Okay... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, except that I wouldn't suggest enabling spans by default - the default use is to parse for runtime use of the AST, not for editor cycle.
I also like the wrapping of Range because it allows us to swap for the new Range type once it stabilizes and enable Copy - https://hackmd.io/@uhs6rVdLTSS0gnie4q0fqA/ryjYJW2pa
I don't know why github can't output my commit, but fork has it: 460c365 |
Had to play around with reverting to fix the github. @zbraniecki |
Is there a Git attribute we need to set to make checking out the new fixtures that explicitly test carriage returns from getting smudged on checkout? |
I couldn't think of anything better than just disabling the documentation tests. Otherwise it's too much work to modify the documentation, which I think will only make it harder to read. From the suggestions to create a separate task for testing documentation, skipping the spans feature. |
e3722f0
to
bbab212
Compare
@@ -63,3 +64,6 @@ required-features = ["json"] | |||
name = "parser_fixtures" | |||
path = "tests/parser_fixtures.rs" | |||
required-features = ["json"] | |||
|
|||
[lib] | |||
doctest = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Ertanic Why do you still think this exclusion is necessary given that spans
is no longer a default feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the test task is started with the --all-features
flag.
I've tried fixing the error during the |
Please don't do any merging into this branch. I've force pushed several times to clean up the history and you keep pushing the dirty commits back into the timeline. Do feel free to work on it if there is something to do, but before committing either |
…normalized_fixtures()` ...and compare with expected span data.
Somehow the new fixture layout for un-normalized tests is running afoul of this heuristic: https://github.com/projectfluent/fluent-rs/blob/main/fluent-syntax/src/serializer.rs#L425-L429 I can actually pass tests by not adding the extra carriage return there, but I also don't see a test for the "rare edge case" that is supposed to fix in the first place, so just bypassing it doesn't seem like a good plan. |
let formatted_path = path.parent().unwrap().join("formatted"); | ||
|
||
let formatted_content = fs::read_to_string( | ||
formatted_path | ||
.join("junked") | ||
.join(path.file_name().unwrap()), | ||
) | ||
.unwrap(); | ||
assert_eq!(formatted_content, reserialized); | ||
let formatted = parse(formatted_content.as_str()).unwrap_or_else(|(res, _)| res); | ||
|
||
let formatted_content_without_junk = | ||
fs::read_to_string(formatted_path.join(path.file_name().unwrap())).unwrap(); | ||
assert_eq!(formatted_content_without_junk, reserialized_without_junk); | ||
let formatted_without_junk = | ||
parse(formatted_content_without_junk.as_str()).unwrap_or_else(|(res, _)| res); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The more I fiddle with it trying to figure out what's going on with tests the more I think this approach to having expanded files for just the serialized message is not the right approach to testing. We should be able to test the original JSON fixtures and just use a query engine of some kind to extract the message bit and work around having serialized position data in our actual results.
With all these expanded and multiple entries in different formats in becomes much harder to audit what tests are actually doing and compare with other implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anywhere here the span results are actually being tested? Mostly I see the tests modified to try to dodge the span info, but we could use at least a couple inputs with the expected position data tested.
See my other comment about the pre-formated test results. I'm a bit skeptical about that being the most maintainable solution, especially since we can't even work out the line ending handling on the file reads. Having the expectation in JSON (as much as I actually despise JSON) does have the advantage of being unambiguous about what the EOL/EOF business is. How would those be updated? Can we backtrack and just use the JSON expectations and figure out how to query the relevant bits we need out of the upstream fixture expectations vs. our serialized data?
…dtrip_unnormalized_fixtures()`" This reverts commit 0756bd1.
I was pretty excited to get this feature landed before cutting a release, but I think I'm going to have to delay it. Hopefully we don't have to delay it long—I have no objection to doing a new release on a relatively short timeline as soon as this is actually ready. My concerns are mostly related to testing, but I was playing around with fixing the tests and it feels like something is wrong with the scruct itself. Enabling the feature (or not) should not immediately break all existing usage of the serialization. That would make it very hard for existing systems with serialized data to migrate. If possible I would really like existing apps not to break the moment somebody tries enabling the feature. That may not be quite possible, but we should at least think through what the ramifications are and document how/why a change needs to be made to keep using existing code easily on the new version. Until that gets thought through and tests pass both with and without the feature enabled I can't reasonably merge this, and since we have problems with old dependencies starting to hinder usage I kind of need to get an update out to address that. Again I do want this feature to land and am willing to facilitate a release cycle special for it as soon as we're actually comfortable with the upgrade path if there are any breaking changes and tests work well enough to rely on. |
I do not know how to fix the |
I needed a parser for .ftl files. I found tree-sitter-fluent, but for some reason it couldn't parse a valid file, throwing errors when trying to use replaceable expressions. Decided to use fluent-syntax, but why does the javascript version have node spans but the rust version does not. This PR solves this issue, but since there is usually no need for spans, I hid them behind the
spans
feature.And also to avoid conflicts in tests, because there the tree is formatted, because of which the spans change, the implementation version of
PartialEq
for AST nodes was divided into aderive
implementation and a manual one.The good idea is to write tests to match the spans, but I'm not sure how best to do that, I need help with this.