-
Notifications
You must be signed in to change notification settings - Fork 605
Store keywords in enum ~2x perf. improvement #193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Performance-wise it's about 2x as fast for the two example queries. |
Pull Request Test Coverage Report for Build 132507543
💛 - Coveralls |
Sorry about the delay, this took some thinking. I like this direction, but not because of the reasons you list:
Rather, I like that this will allow matching on regular
I think we should wait for the current PRs (except for my WIP one) to be finished before merging such major changes. And I'll need to think more about the specifics of this PR. |
For this PR I'd like to:
Are you willing to work on that? Next we'll be able to do the |
The clone kludge in `_ => self.expected("date/time field", Token::Word(w.clone())))` will become unnecessary once we stop using a separate match for the keywords, as suggested in https://github.com/andygrove/sqlparser-rs/pull/193#issuecomment-641607194
Sounds like a good way forward! This will provide both of the ergonomical benefits and (though I agree of less relevance for many users and for further development) performance benefit. |
Great! How do you want to proceed? Would you like me to hold off merging #195 (I can rebase it once this is landed, if you prefer)? Also note that I didn't mean to dismiss the performance requirements of certain users, I just want to understand what those are - so that we can prioritize appropriately. |
I am fine with merging #195 first! As for performance, the biggest change comes from reducing the number of string comparisons by converting them as early as possible. For your |
Ah, I think you mean :
Which is there before this PR. |
This simplifies codes slightly, removing the need deal with the EOF case explicitly. The clone kludge in `_ => self.expected("date/time field", Token::Word(w.clone())))` will become unnecessary once we stop using a separate match for the keywords, as suggested in https://github.com/andygrove/sqlparser-rs/pull/193#issuecomment-641607194
That's the one. There's a similar one in |
Yeah, I see, that will mostly have the same impact. So, the other checks will mostly convert to pointer/length or pointer/length/memcmp, but indeed, those should be cheap compared to the |
I Implemented the changes + merged with master:
Is there anything you think that is worth splitting into another PR? Also I am not sure whether we should change anything to the RESERVED_FOR_TABLE_ALIAS RESERVED_FOR_COLUMN_ALIAS as they are listed just once. |
Some style improvement might also be to change
into
and
or something like that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other changes are all related to changing the keyword to this Keyword type.
Related - sure they are. There are lots of mechanical changes though, and it's been hard to find the manual changes among the automated ones, and it's those changes that need review -- I had comments about the ones I did find.
If you had used KWD![FOO]
instead ofKeyword::FOO
, you could have made one commit introducing KWD and replacing "KEYWORD" with KWD![KEYWORD], but keeping the string keywords, and another committing switching the keyword storage.
I don't think it's necessary now, but I'd appreciate if you highlighted the spots in parser.rs where you did manual changes in case I missed some.
Yep, no point in keeping them.
No, I don't think it's worth it, given it's a small helper that's used only a few times. |
@Dandandan are you still working on this or is it ready to merge? |
Just addressed the comment "I'd like to keep the logical grouping and ordering here, as well as the comments." by reverting the change to use binary search for the reserved keywords. It seems to affect performance only little (because they are relatively small in size), but makes sense to have them together. So now it is good to go for me! I also experimented a bit with using |
Awesome, thanks! |
Thanks for the extensive feedback! |
Proposal:
Store keywords in an enum.
Store enum in a sorted array in order to do a lookup
This provides some benefits:
Any ideas on this?