-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[C23] Fix typeof handling in enum declarations #146394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We have a parsing helper function which parses either a parenthesized expression or a parenthesized type name. This is used when parsing a unary operator such as sizeof, for example. The problem this solves is when that construct is ambiguous. Consider: enum E : typeof(int) { F }; After we've parsed the 'typeof', what ParseParenExpression() is responsible for is '(int) { F }' which looks like a compound literal expression when it's actually the parens and operand for 'typeof' followed by the enumerator list for the enumeration declaration. Then consider: sizeof (int){ 0 }; After we've parsed 'sizeof', ParseParenExpression is responsible for parsing something grammatically similar to the problematic case. The solution is to recognize that the expression form of 'typeof' is required to have parentheses. So we know the open and close parens that ParseParenExpression handles must be part of the grammar production for the operator, not part of the operand expression itself. Fixes llvm#146351
@llvm/pr-subscribers-clang Author: Aaron Ballman (AaronBallman) ChangesWe have a parsing helper function which parses either a parenthesized expression or a parenthesized type name. This is used when parsing a unary operator such as sizeof, for example. The problem this solves is when that construct is ambiguous. Consider:
After we've parsed the 'typeof', what ParseParenExpression() is responsible for is '(int) { F }' which looks like a compound literal expression when it's actually the parens and operand for 'typeof' followed by the enumerator list for the enumeration declaration. Then consider:
After we've parsed 'sizeof', ParseParenExpression is responsible for parsing something grammatically similar to the problematic case. The solution is to recognize that the expression form of 'typeof' is required to have parentheses. So we know the open and close parens that ParseParenExpression handles must be part of the grammar production for the operator, not part of the operand expression itself. Fixes #146351 Full diff: https://github.com/llvm/llvm-project/pull/146394.diff 5 Files Affected:
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 45b5f77545361..1b12061dad85b 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -304,6 +304,8 @@ C23 Feature Support
which clarified how Clang is handling underspecified object declarations.
- Clang now accepts single variadic parameter in type-name. It's a part of
`WG14 N2975 <https://open-std.org/JTC1/SC22/WG14/www/docs/n2975.pdf>`_
+- Fixed a bug with handling the type operand form of ``typeof`` when it is used
+ to specify a fixed underlying type for an enumeration. #GH146351
C11 Feature Support
^^^^^^^^^^^^^^^^^^^
diff --git a/clang/include/clang/Parse/Parser.h b/clang/include/clang/Parse/Parser.h
index a47e23ffbd357..ced9613b9ab9b 100644
--- a/clang/include/clang/Parse/Parser.h
+++ b/clang/include/clang/Parse/Parser.h
@@ -4184,7 +4184,12 @@ class Parser : public CodeCompletionHandler {
/// ParseParenExpression - This parses the unit that starts with a '(' token,
/// based on what is allowed by ExprType. The actual thing parsed is returned
/// in ExprType. If stopIfCastExpr is true, it will only return the parsed
- /// type, not the parsed cast-expression.
+ /// type, not the parsed cast-expression. If ParenKnownToBeNonCast is true,
+ /// the initial open paren and its matching close paren are known to be part
+ /// of another grammar production and not part of the operand. e.g., the
+ /// typeof and typeof_unqual operators in C. If it is false, then the function
+ /// has to parse the parens to determine whether they're part of a cast or
+ /// compound literal expression rather than a parenthesized type.
///
/// \verbatim
/// primary-expression: [C99 6.5.1]
@@ -4210,6 +4215,7 @@ class Parser : public CodeCompletionHandler {
/// \endverbatim
ExprResult ParseParenExpression(ParenParseOption &ExprType,
bool stopIfCastExpr, bool isTypeCast,
+ bool ParenKnownToBeNonCast,
ParsedType &CastTy,
SourceLocation &RParenLoc);
diff --git a/clang/lib/Parse/ParseExpr.cpp b/clang/lib/Parse/ParseExpr.cpp
index 3cf3d4ea7d705..c6731fca7bfa7 100644
--- a/clang/lib/Parse/ParseExpr.cpp
+++ b/clang/lib/Parse/ParseExpr.cpp
@@ -758,7 +758,8 @@ ExprResult Parser::ParseCastExpression(CastParseKind ParseKind,
ParsedType CastTy;
SourceLocation RParenLoc;
Res = ParseParenExpression(ParenExprType, false /*stopIfCastExr*/,
- isTypeCast == TypeCastState::IsTypeCast, CastTy,
+ isTypeCast == TypeCastState::IsTypeCast,
+ false /*ParenKnownToBeNonCast*/, CastTy,
RParenLoc);
// FIXME: What should we do if a vector literal is followed by a
@@ -2110,12 +2111,24 @@ Parser::ParseExprAfterUnaryExprOrTypeTrait(const Token &OpTok,
// If it starts with a '(', we know that it is either a parenthesized
// type-name, or it is a unary-expression that starts with a compound
// literal, or starts with a primary-expression that is a parenthesized
- // expression.
+ // expression. Most unary operators have an expression form without parens
+ // as part of the grammar for the operator, and a type form with the parens
+ // as part of the grammar for the operator. However, typeof and
+ // typeof_unqual require parens for both forms. This means that we *know*
+ // that the open and close parens cannot be part of a cast expression,
+ // which means we definitely are not parsing a compound literal expression.
+ // This disambiguates a case like enum E : typeof(int) { }; where we've
+ // parsed typeof and need to handle the (int){} tokens properly despite
+ // them looking like a compound literal, as in sizeof (int){}; where the
+ // parens could be part of a parenthesized type name or for a cast
+ // expression of some kind.
+ bool ParenKnownToBeNonCast =
+ OpTok.isOneOf(tok::kw_typeof, tok::kw_typeof_unqual);
ParenParseOption ExprType = ParenParseOption::CastExpr;
SourceLocation LParenLoc = Tok.getLocation(), RParenLoc;
- Operand = ParseParenExpression(ExprType, true/*stopIfCastExpr*/,
- false, CastTy, RParenLoc);
+ Operand = ParseParenExpression(ExprType, true /*stopIfCastExpr*/, false,
+ ParenKnownToBeNonCast, CastTy, RParenLoc);
CastRange = SourceRange(LParenLoc, RParenLoc);
// If ParseParenExpression parsed a '(typename)' sequence only, then this is
@@ -2589,10 +2602,11 @@ bool Parser::tryParseOpenMPArrayShapingCastPart() {
return !ErrorFound;
}
-ExprResult
-Parser::ParseParenExpression(ParenParseOption &ExprType, bool stopIfCastExpr,
- bool isTypeCast, ParsedType &CastTy,
- SourceLocation &RParenLoc) {
+ExprResult Parser::ParseParenExpression(ParenParseOption &ExprType,
+ bool stopIfCastExpr, bool isTypeCast,
+ bool ParenKnownToBeNonCast,
+ ParsedType &CastTy,
+ SourceLocation &RParenLoc) {
assert(Tok.is(tok::l_paren) && "Not a paren expr!");
ColonProtectionRAIIObject ColonProtection(*this, false);
BalancedDelimiterTracker T(*this, tok::l_paren);
@@ -2747,7 +2761,7 @@ Parser::ParseParenExpression(ParenParseOption &ExprType, bool stopIfCastExpr,
T.consumeClose();
ColonProtection.restore();
RParenLoc = T.getCloseLocation();
- if (Tok.is(tok::l_brace)) {
+ if (!ParenKnownToBeNonCast && Tok.is(tok::l_brace)) {
ExprType = ParenParseOption::CompoundLiteral;
TypeResult Ty;
{
@@ -2757,7 +2771,7 @@ Parser::ParseParenExpression(ParenParseOption &ExprType, bool stopIfCastExpr,
return ParseCompoundLiteralExpression(Ty.get(), OpenLoc, RParenLoc);
}
- if (Tok.is(tok::l_paren)) {
+ if (!ParenKnownToBeNonCast && Tok.is(tok::l_paren)) {
// This could be OpenCL vector Literals
if (getLangOpts().OpenCL)
{
diff --git a/clang/lib/Parse/Parser.cpp b/clang/lib/Parse/Parser.cpp
index 18f399aca59e8..2f201cc49cf0b 100644
--- a/clang/lib/Parse/Parser.cpp
+++ b/clang/lib/Parse/Parser.cpp
@@ -1607,7 +1607,8 @@ ExprResult Parser::ParseAsmStringLiteral(bool ForAsmLabel) {
EnterExpressionEvaluationContext ConstantEvaluated(
Actions, Sema::ExpressionEvaluationContext::ConstantEvaluated);
AsmString = ParseParenExpression(ExprType, true /*stopIfCastExpr*/, false,
- CastTy, RParenLoc);
+ false /*ParenKnownToBeNonCast*/, CastTy,
+ RParenLoc);
if (!AsmString.isInvalid())
AsmString = Actions.ActOnConstantExpression(AsmString);
diff --git a/clang/test/Parser/c2x-typeof.c b/clang/test/Parser/c2x-typeof.c
index 9c836dfa6d823..245c127d71e03 100644
--- a/clang/test/Parser/c2x-typeof.c
+++ b/clang/test/Parser/c2x-typeof.c
@@ -42,3 +42,13 @@ _Static_assert(__builtin_offsetof(typeof(s), i) == 0);
_Static_assert(__builtin_offsetof(typeof_unqual(struct S), i) == 0);
_Static_assert(__builtin_offsetof(typeof_unqual(s), i) == 0);
+// Show that typeof and typeof_unqual can be used in the underlying type of an
+// enumeration even when given the type form. Note, this can look like a
+// compound literal expression, which caused GH146351.
+enum E3 : typeof(int) { ThirdZero }; // (int) {}; is not a compound literal!
+enum E4 : typeof_unqual(int) { FourthZero }; // Same here
+
+// Ensure that this invalid construct is diagnosed instead of being treated
+// as typeof((int){ 0 }).
+typeof(int) { 0 } x; // expected-error {{a type specifier is required for all declarations}} \
+ expected-error {{expected identifier or '('}}
|
clang/include/clang/Parse/Parser.h
Outdated
@@ -4210,6 +4215,7 @@ class Parser : public CodeCompletionHandler { | |||
/// \endverbatim | |||
ExprResult ParseParenExpression(ParenParseOption &ExprType, | |||
bool stopIfCastExpr, bool isTypeCast, | |||
bool ParenKnownToBeNonCast, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like it might be semantically simpler to invert the value of this and call it just MightBeCast
? Reasoning about the fact that passing false
here means that ‘it’s not known to be a non-cast’ is a bit headache-inducing at least for me...
Actually on that note, TypeCastState
is a thing. Can we just merge this parameter with the isTypeCast
one and make it a TypeCastState
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about introducing a new enumeration here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually on that note, TypeCastState is a thing. Can we just merge this parameter with the isTypeCast one and make it a TypeCastState?
TypeCastState
is cursed in that it doesn't mean what you might think it means: 77e21fc
So it doesn't mean "this is, is not, or maybe a cast expression", it means "if there's a typo, should we allow type names or not?" I verified this is not vestigial code after delayed typo correction was removed; it's still used for typo correction today. We probably should rename some stuff to make this far more clear though, because your suggestion is what I initially tried to implement and broke approximately every test in the suite.
How about introducing a new enumeration here?
Given that there's three bools in a row, yeah, that's not a bad idea.
How about this for a solution:
- Rename
TypeCastState
toTypoCorrectionTypeBehavior
- Rename
NotTypeCast
,IsTypeCast
, andMaybeTypeCast
toAllowNonTypes,
AllowTypes, and
AllowBoth` - Add a new enumeration
TypeCastState
with membersIsTypeCast
,IsNotTypeCast
, andUnknown
- Change
ParseParenExpression
to take both aTypoCorrectionTypeBehavior
and aTypeCastState
.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This solution looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it doesn't mean "this is, is not, or maybe a cast expression", it means "if there's a typo, should we allow type names or not?"
Ok, I like to say I’m bad at naming, but that is actually awful. ;Þ
How about this for a solution:
- Rename
TypeCastState
toTypoCorrectionTypeBehavior
- Rename
NotTypeCast
,IsTypeCast
, andMaybeTypeCast
toAllowNonTypes,
AllowTypes, and
AllowBoth`- Add a new enumeration
TypeCastState
with membersIsTypeCast
,IsNotTypeCast
, andUnknown
- Change
ParseParenExpression
to take both aTypoCorrectionTypeBehavior
and aTypeCastState
.
I was about to suggest this so sgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up picking different names for TypeCastState
because that was still confusing.
@@ -2850,7 +2870,8 @@ Parser::ParseParenExpression(ParenParseOption &ExprType, bool stopIfCastExpr, | |||
isFoldOperator(NextToken().getKind())) { | |||
ExprType = ParenParseOption::FoldExpr; | |||
return ParseFoldExpression(ExprResult(), T); | |||
} else if (isTypeCast) { | |||
} else if (CorrectionBehavior == TypoCorrectionTypeBehavior::AllowTypes) { | |||
// FIXME: This should not be predicated on typo correction behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is worth noting!
This was an unfortunate surprise. However, it's a preexisting defect that I don't have the bandwidth to correct right now.
The old TypeCastState
enumeration was introduced to help with typo correction (77e21fc) and we accidentally ended up using it for parsing decisions:
llvm-project/clang/lib/Parse/ParseExpr.cpp
Line 761 in a63f572
isTypeCast == TypeCastState::IsTypeCast, CastTy, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not surprised that that happened given the name of the enum...
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more thing I thought of: What about C++ edge cases, e.g. both Clang and GCC accept this... which doesn’t quite seem right to me (https://godbolt.org/z/K58eEvG9P):
typeof(int){} x; // Probably parsed as typeof(int{})
@@ -2850,7 +2870,8 @@ Parser::ParseParenExpression(ParenParseOption &ExprType, bool stopIfCastExpr, | |||
isFoldOperator(NextToken().getKind())) { | |||
ExprType = ParenParseOption::FoldExpr; | |||
return ParseFoldExpression(ExprResult(), T); | |||
} else if (isTypeCast) { | |||
} else if (CorrectionBehavior == TypoCorrectionTypeBehavior::AllowTypes) { | |||
// FIXME: This should not be predicated on typo correction behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not surprised that that happened given the name of the enum...
Actually, I’m not sure what we think this is supposed to be; I haven’t checked. |
Actually, that’s just a compound literal; I forgot we supported those in C++. |
It's a reallllly funky compound literal though, so worth explaining for others: That's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM!
Windows failures are unrelated, they're a CI issue tracked by #145703 |
Nope, I'm wrong.
I'll investigate. |
GCC reject this https://godbolt.org/z/YEdnGh8T5 |
GCC rejects it in C mode; it accepts it in C++ mode. |
Yeah, I think there are a few different issues. 1) we should not accept in C. 2) C++ has its own issue in that we accept |
Since we’re on to C++ support now, here’s one more fun case: class A {}; A a;
class B : typeof(A) {};
class C : typeof(a) {}; |
Oh, derp, and we don't accept in C because of this PR, I have a test case for that. :-D |
Good test cases, but let's handle C++ separately to keep the patch reasonable? |
void f() {
^ typeof((void)0) {}; // Ok
^ typeof(void) {}; // Error but should probably be ok
} But yeah, doing all of that in this pr might be too much—unless the changes you’ve already made happen to already fix these cases. |
C++ cases are not changed by this patch, but the Objective-C ones are:
So good call on adding test coverage there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/144/builds/29507 Here is the relevant piece of the build log for the reference
|
We have a parsing helper function which parses either a parenthesized expression or a parenthesized type name. This is used when parsing a unary operator such as sizeof, for example.
The problem this solves is when that construct is ambiguous. Consider:
After we've parsed the 'typeof', what ParseParenExpression() is responsible for is '(int) { F }' which looks like a compound literal expression when it's actually the parens and operand for 'typeof' followed by the enumerator list for the enumeration declaration. Then consider:
After we've parsed 'sizeof', ParseParenExpression is responsible for parsing something grammatically similar to the problematic case.
The solution is to recognize that the expression form of 'typeof' is required to have parentheses. So we know the open and close parens that ParseParenExpression handles must be part of the grammar production for the operator, not part of the operand expression itself.
Fixes #146351