-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve handling of supertypes with foreign identifiers #5888
base: next
Are you sure you want to change the base?
Improve handling of supertypes with foreign identifiers #5888
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the wrong place to do this. I would rather leave the supertype names we get from implements statements as they are (They are checked by the compiler.) and only sanitize supertype names provided via the typelib.
As I mentioned in the OP, this will cause the tests on the implicit worksheet inspection to fail because it does not handle the bracketed version. I want to avoid making it conditional to bracket the name because that would devolve into whack-the-mole game trying to find all possible offensive characters that could appear in a supertype name. When running the bracketed name via the expression parser in the |
As stated before, I am not a fan of the pessimistic approach, but if everybody else is OK with always applying brackets in the type hierarchy pass. Just one question, is there a possibility that there will be a dot inside a foreign name in a supertype name? The splitting logic would fail in such a case. I have two more remarks. For one, I think we should use something like unsanitizable instead of insane in the log entries. |
Regarding the foreign name identifier: MS-VBAL 3.3.5.3 does not explicitly restrict against a dot in the identifier. We must treat this as a possibility. Indeed as I noted in #5881, an Access control actually could contain a dot in its name and such member can appear in the type information API. However that member would be impossible to reference from VBA code; only references we'd encounter with VBA code is the underlined version of the control name. I could not produce an example of a foreign name with a dot -- Excel is much more strict than Access in this regards but even so, Access will not create a member with such identifier for use in VBA. Furthermore when we consider the identifier rules for C language, it seems very unlikely that an external type library would be capable of containing a dot character for an identifier. In fact, Open Group's Document C706, in section 4.2.1.2 which is used as the base for IDL grammar indicates that it follows the C language rules. This means a type info that contains such identifier is not legal. The internal type library we are accessing likely is not compliant, which is probably why we see 2 members defined for the same Access control with the underlined version being used in the VBA codebase. In such cases, the splitting will yield a nonsensical result which means such member will not be matched and will be logged. I think that is acceptable. Regarding the string vs. StringBuilder. I was unsure whether there was a benefit to using SB for a small number of concatenation since we cannot have more than 4 parts in a fully qualified name and my initial research indicated that for a small number of concatenations (e.g. less than 5), string is faster than |
I want to check in a possible alternative before finalizing the PR. Would it be preferable to provide a custom parser for handling the identifiers. We can use a rule similar to this:
With this, we can handle both bracketed and unbracketed versions more nicely:
The downside is that we would need a new path to handle the binding since they would be no longer a Is this path worth pursuing? |
marking as [hacktoberfest-accepted], because the exploration of this is merited, whether the PR is merged in the end or not. As such it should count towards hacktoberfest, where possible |
Want to follow up on this and see if you have any thoughts about the alternative path proposed so I can wrap this PR up. Thanks! |
Any news here? Greetings |
Sorry, it has been quite some time since I last found time to check on Rubberduck. If the special parser is only used in the supertype resolution, that might be doable. However, we will need a separate resolver for this, since the context types are different. Changing anything around this in the normal parser seems to be prohibitively complicated since it would require to rewire the entire expression parsing. I would suggest to wrap thus up with the original approach for now, just with the word insane substituted for something less drastic. |
…nto ImrpoveSuperTypeResolution
…nto ImrpoveSuperTypeResolution
…o perform better even for a small amount of concatenation.
… was observed from other PRs
…ts, which help speed up the testing cycle a bit.
❌ Build Rubberduck 2.5.2.6231 failed (commit f4419f470c by @bclothier) |
✅ Build Rubberduck 2.5.2.6232 completed (commit cae8fa7d60 by @bclothier) |
Partially addresses the warnings from #5881
Because certain supertypes can have foreign names, they may require bracketing. However, we also need to handle cases where the supertype names might be already bracketed or contains qualified names (e.g.
Excel.Worksheet
orVBAProject.Class1
), we need to be pessimistic about the legality of the name when resolving the supertype in theTypeHierarchyPass
. The new method introduced will attempt to split the supertype name on the.
and bracket the components if possible, rejecting any forms where the brackets are somewhere in middle of the name or are unbalanced. Rejected names will be logged as insane.My original attempt was to bracket when adding supertype name in the
ClassModuleDeclaration
but this caused some tests around implicit references to fail. For that reason, I decided that modifying the logic in theTypeHierarchyPass
was likely the less invasive choice given that this is where we pass the string into a parser.