Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling of supertypes with foreign identifiers #5888

Open
wants to merge 7 commits into
base: next
Choose a base branch
from

Conversation

bclothier
Copy link
Contributor

Partially addresses the warnings from #5881

Because certain supertypes can have foreign names, they may require bracketing. However, we also need to handle cases where the supertype names might be already bracketed or contains qualified names (e.g. Excel.Worksheet or VBAProject.Class1), we need to be pessimistic about the legality of the name when resolving the supertype in the TypeHierarchyPass. The new method introduced will attempt to split the supertype name on the . and bracket the components if possible, rejecting any forms where the brackets are somewhere in middle of the name or are unbalanced. Rejected names will be logged as insane.

My original attempt was to bracket when adding supertype name in the ClassModuleDeclaration but this caused some tests around implicit references to fail. For that reason, I decided that modifying the logic in the TypeHierarchyPass was likely the less invasive choice given that this is where we pass the string into a parser.

Copy link
Contributor

@MDoerner MDoerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the wrong place to do this. I would rather leave the supertype names we get from implements statements as they are (They are checked by the compiler.) and only sanitize supertype names provided via the typelib.

@bclothier
Copy link
Contributor Author

As I mentioned in the OP, this will cause the tests on the implicit worksheet inspection to fail because it does not handle the bracketed version. I want to avoid making it conditional to bracket the name because that would devolve into whack-the-mole game trying to find all possible offensive characters that could appear in a supertype name. When running the bracketed name via the expression parser in the TypeHierarchyPass, it gets stripped away for the purpose of comparing against other declarations. For that reason, bracketing can be unconditionally in the TypeHierarchyPass, not so in the ReferenceResolverRunnerBase.

@MDoerner
Copy link
Contributor

As stated before, I am not a fan of the pessimistic approach, but if everybody else is OK with always applying brackets in the type hierarchy pass.

Just one question, is there a possibility that there will be a dot inside a foreign name in a supertype name? The splitting logic would fail in such a case.

I have two more remarks. For one, I think we should use something like unsanitizable instead of insane in the log entries.
The second thing is that this code runs a lot on a hot path. So, it might benefit from using a string builder, if there are multiple parts.

@bclothier
Copy link
Contributor Author

bclothier commented Oct 25, 2021

Regarding the foreign name identifier:

MS-VBAL 3.3.5.3 does not explicitly restrict against a dot in the identifier. We must treat this as a possibility. Indeed as I noted in #5881, an Access control actually could contain a dot in its name and such member can appear in the type information API. However that member would be impossible to reference from VBA code; only references we'd encounter with VBA code is the underlined version of the control name. I could not produce an example of a foreign name with a dot -- Excel is much more strict than Access in this regards but even so, Access will not create a member with such identifier for use in VBA.

Furthermore when we consider the identifier rules for C language, it seems very unlikely that an external type library would be capable of containing a dot character for an identifier. In fact, Open Group's Document C706, in section 4.2.1.2 which is used as the base for IDL grammar indicates that it follows the C language rules. This means a type info that contains such identifier is not legal. The internal type library we are accessing likely is not compliant, which is probably why we see 2 members defined for the same Access control with the underlined version being used in the VBA codebase.

In such cases, the splitting will yield a nonsensical result which means such member will not be matched and will be logged. I think that is acceptable.

Regarding the string vs. StringBuilder. I was unsure whether there was a benefit to using SB for a small number of concatenation since we cannot have more than 4 parts in a fully qualified name and my initial research indicated that for a small number of concatenations (e.g. less than 5), string is faster than StringBuilder. However, I looked again and this blog suggests that reusing the StringBuilder and calling Clear may yield better performance. I will take some time to measure the performance and adjust accordingly.

@bclothier
Copy link
Contributor Author

@MDoerner ,

I want to check in a possible alternative before finalizing the PR. Would it be preferable to provide a custom parser for handling the identifiers. We can use a rule similar to this:

identifierExpression:
    identifier                                                          # bareIdentifier
    | (identifier | foreignName) ((DOT (identifier | foreignName))*)?   # qualifiedIdentifierExpr
    | foreignIdentifier (foreignIdentifier*)?                           # bareForeignName
;

With this, we can handle both bracketed and unbracketed versions more nicely:


        [Test]
        [TestCase("A B", "A B", nameof(VBAParser.BareForeignNameContext))]
        [TestCase("Worksheet", "Worksheet", nameof(VBAParser.BareIdentifierContext))]
        [TestCase("a.b", "a.b", nameof(VBAParser.QualifiedIdentifierExprContext))]
        [TestCase("a[b]c", "a[b]c", nameof(VBAParser.BareForeignNameContext))]
        [TestCase("a[b.c]d", "a[b.c]d", nameof(VBAParser.BareForeignNameContext))]
        [TestCase("[a[b.c]d].[e[f.g]h]", "[a[b.c]d].[e[f.g]h]", nameof(VBAParser.QualifiedIdentifierExprContext))]
        [TestCase("Excel.Worksheet", "Excel.Worksheet", nameof(VBAParser.QualifiedIdentifierExprContext))]
        [TestCase("[Excel].[Worksheet]", "[Excel].[Worksheet]", nameof(VBAParser.QualifiedIdentifierExprContext))]
        public void Test(string expression, string expected, string ruleName)
        {
            var parser = new VBAExpressionParser();
            var tree = parser.ParseIdentifier(expression);
            var ctx = (VBAParser.IdentifierExpressionContext)tree;

            Assert.AreEqual(expected, tree.GetText());
            var actualRule = string.Empty;
            switch(tree)
            {
                case VBAParser.BareForeignNameContext bareForeignNameContext:
                    actualRule = nameof(VBAParser.BareForeignNameContext);
                    break;
                case VBAParser.BareIdentifierContext bareIdentifierContext:
                    actualRule = nameof(VBAParser.BareIdentifierContext);
                    break;
                case VBAParser.QualifiedIdentifierExprContext qualifiedIdentifierExprContext:
                    actualRule = nameof(VBAParser.QualifiedIdentifierExprContext);
                    break;
            }
            Assert.AreEqual(ruleName, actualRule);
        }

The downside is that we would need a new path to handle the binding since they would be no longer a simpleNameExpr or expression (unless I have missed a way to get ANTLR to consider it equivalent).

Is this path worth pursuing?

@Vogel612 Vogel612 added the hacktoberfest-accepted Thank you! Get that t-shirt! label Nov 1, 2021
@Vogel612
Copy link
Member

Vogel612 commented Nov 1, 2021

marking as [hacktoberfest-accepted], because the exploration of this is merited, whether the PR is merged in the end or not. As such it should count towards hacktoberfest, where possible

@bclothier
Copy link
Contributor Author

@MDoerner

Is this path worth pursuing?

Want to follow up on this and see if you have any thoughts about the alternative path proposed so I can wrap this PR up.

Thanks!

@Tanarri
Copy link
Contributor

Tanarri commented Sep 21, 2022

Any news here?
Do you need any testers for this fix?

Greetings

@MDoerner
Copy link
Contributor

Sorry, it has been quite some time since I last found time to check on Rubberduck.

If the special parser is only used in the supertype resolution, that might be doable. However, we will need a separate resolver for this, since the context types are different.

Changing anything around this in the normal parser seems to be prohibitively complicated since it would require to rewire the entire expression parsing.

I would suggest to wrap thus up with the original approach for now, just with the word insane substituted for something less drastic.

@rubberduck-vba-releasebot
Copy link
Member

@rubberduck-vba-releasebot
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hacktoberfest-accepted Thank you! Get that t-shirt!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants