-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support comments in array shape #213
base: 1.23.x
Are you sure you want to change the base?
Conversation
src/Ast/Type/ArrayShapeNode.php
Outdated
@@ -13,7 +14,7 @@ class ArrayShapeNode implements TypeNode | |||
|
|||
use NodeAttributes; | |||
|
|||
/** @var ArrayShapeItemNode[] */ | |||
/** @var array<ArrayShapeItemNode|CommentNode|IdentifierTypeNode> */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think comments should be part of the AST as nodes. Being in node attributes would match how nikic/PHP-Parser
works, and I would like that here too. It would make it easier to work with it, and also support comments everywhere, without BC breaks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, but can you please provide more of a big picture to me, here? ie:
- why do we need to retain them at all?
- does the printer need to reproduce them?
- if so, what about new lines? Single line comments can depend on new lines to prevent them from affecting following content
And a little background about the printer would be helpful, because if that's what's producing the output of dumpType
and what we see in error messages, then I imagine you would still want single-line output, in which case comments are just going to add noise, and if we agree on that then it's not clear to me why we shouldn't just ignore comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took another crack at it:
- replaced simple // regex with a more robust pattern that captures single and block comments (so they can happen anywhere, as you say)
- modified
parseArrayShape
to catch comments. This seems to be necessary because array shape is one of the few node types that does recursive parsing (I had a look at nikic parser, by the way, and they have an architecture that works better for this kind of thing, with all the work of comment-capturing happening in a centralizednext
method that drives the entire process).
Am I getting warmer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to retain them at all?
It might be useful for the users of the parser to read them. For the same reason nikic/PHP-Parser retains them. And we might need it for the format-preserving printer.
does the printer need to reproduce them?
You can write some tests for this in PrinterTest. If we remove an array shape item, its comment should also be removed. If we change something in an array shape item, its comment should be preserved.
You can also try to write a PrinterTest test case that removes/adds comments to AST nodes.
As for the Printer implementation, you can see the similarities between:
- https://github.com/nikic/PHP-Parser/blob/f4961b89ac0289455295e2dc2169527098a6e11b/lib/PhpParser/PrettyPrinterAbstract.php#L751-L998 (and also the
pArray
method) phpdoc-parser/src/Printer/Printer.php
Lines 506 to 676 in 9f854d2
private function printArrayFormatPreserving(array $nodes, array $originalNodes, TokenIterator $originalTokens, int &$tokenIndex, string $parentNodeClass, string $subNodeName): ?string { $diff = $this->differ->diffWithReplacements($originalNodes, $nodes); $mapKey = $parentNodeClass . '->' . $subNodeName; $insertStr = $this->listInsertionMap[$mapKey] ?? null; $result = ''; $beforeFirstKeepOrReplace = true; $delayedAdd = []; $insertNewline = false; [$isMultiline, $beforeAsteriskIndent, $afterAsteriskIndent] = $this->isMultiline($tokenIndex, $originalNodes, $originalTokens); if ($insertStr === "\n * ") { $insertStr = sprintf('%s%s*%s', $originalTokens->getDetectedNewline() ?? "\n", $beforeAsteriskIndent, $afterAsteriskIndent); } foreach ($diff as $i => $diffElem) { $diffType = $diffElem->type; $newNode = $diffElem->new; $originalNode = $diffElem->old; if ($diffType === DiffElem::TYPE_KEEP || $diffType === DiffElem::TYPE_REPLACE) { $beforeFirstKeepOrReplace = false; if (!$newNode instanceof Node || !$originalNode instanceof Node) { return null; } $itemStartPos = $originalNode->getAttribute(Attribute::START_INDEX); $itemEndPos = $originalNode->getAttribute(Attribute::END_INDEX); if ($itemStartPos < 0 || $itemEndPos < 0 || $itemStartPos < $tokenIndex) { throw new LogicException(); } $result .= $originalTokens->getContentBetween($tokenIndex, $itemStartPos); if (count($delayedAdd) > 0) { foreach ($delayedAdd as $delayedAddNode) { $parenthesesNeeded = isset($this->parenthesesListMap[$mapKey]) && in_array(get_class($delayedAddNode), $this->parenthesesListMap[$mapKey], true); if ($parenthesesNeeded) { $result .= '('; } $result .= $this->printNodeFormatPreserving($delayedAddNode, $originalTokens); if ($parenthesesNeeded) { $result .= ')'; } if ($insertNewline) { $result .= $insertStr . sprintf('%s%s*%s', $originalTokens->getDetectedNewline() ?? "\n", $beforeAsteriskIndent, $afterAsteriskIndent); } else { $result .= $insertStr; } } $delayedAdd = []; } $parenthesesNeeded = isset($this->parenthesesListMap[$mapKey]) && in_array(get_class($newNode), $this->parenthesesListMap[$mapKey], true) && !in_array(get_class($originalNode), $this->parenthesesListMap[$mapKey], true); $addParentheses = $parenthesesNeeded && !$originalTokens->hasParentheses($itemStartPos, $itemEndPos); if ($addParentheses) { $result .= '('; } $result .= $this->printNodeFormatPreserving($newNode, $originalTokens); if ($addParentheses) { $result .= ')'; } $tokenIndex = $itemEndPos + 1; } elseif ($diffType === DiffElem::TYPE_ADD) { if ($insertStr === null) { return null; } if (!$newNode instanceof Node) { return null; } if ($insertStr === ', ' && $isMultiline) { $insertStr = ','; $insertNewline = true; } if ($beforeFirstKeepOrReplace) { // Will be inserted at the next "replace" or "keep" element $delayedAdd[] = $newNode; continue; } $itemEndPos = $tokenIndex - 1; if ($insertNewline) { $result .= $insertStr . sprintf('%s%s*%s', $originalTokens->getDetectedNewline() ?? "\n", $beforeAsteriskIndent, $afterAsteriskIndent); } else { $result .= $insertStr; } $parenthesesNeeded = isset($this->parenthesesListMap[$mapKey]) && in_array(get_class($newNode), $this->parenthesesListMap[$mapKey], true); if ($parenthesesNeeded) { $result .= '('; } $result .= $this->printNodeFormatPreserving($newNode, $originalTokens); if ($parenthesesNeeded) { $result .= ')'; } $tokenIndex = $itemEndPos + 1; } elseif ($diffType === DiffElem::TYPE_REMOVE) { if (!$originalNode instanceof Node) { return null; } $itemStartPos = $originalNode->getAttribute(Attribute::START_INDEX); $itemEndPos = $originalNode->getAttribute(Attribute::END_INDEX); if ($itemStartPos < 0 || $itemEndPos < 0) { throw new LogicException(); } if ($i === 0) { // If we're removing from the start, keep the tokens before the node and drop those after it, // instead of the other way around. $originalTokensArray = $originalTokens->getTokens(); for ($j = $tokenIndex; $j < $itemStartPos; $j++) { if ($originalTokensArray[$j][Lexer::TYPE_OFFSET] === Lexer::TOKEN_PHPDOC_EOL) { break; } $result .= $originalTokensArray[$j][Lexer::VALUE_OFFSET]; } } $tokenIndex = $itemEndPos + 1; } } if (count($delayedAdd) > 0) { if (!isset($this->emptyListInsertionMap[$mapKey])) { return null; } [$findToken, $extraLeft, $extraRight] = $this->emptyListInsertionMap[$mapKey]; if ($findToken !== null) { $originalTokensArray = $originalTokens->getTokens(); for (; $tokenIndex < count($originalTokensArray); $tokenIndex++) { $result .= $originalTokensArray[$tokenIndex][Lexer::VALUE_OFFSET]; if ($originalTokensArray[$tokenIndex][Lexer::VALUE_OFFSET] !== $findToken) { continue; } $tokenIndex++; break; } } $first = true; $result .= $extraLeft; foreach ($delayedAdd as $delayedAddNode) { if (!$first) { $result .= $insertStr; if ($insertNewline) { $result .= sprintf('%s%s*%s', $originalTokens->getDetectedNewline() ?? "\n", $beforeAsteriskIndent, $afterAsteriskIndent); } } $result .= $this->printNodeFormatPreserving($delayedAddNode, $originalTokens); $first = false; } $result .= $extraRight; } return $result; }
But you can also see all the places it touches node's comments there. I skipped these places when implementing the printer here because comments weren't a thing back then. But arguably we'll need to replicate the logic with comments here too, but the needs might not be that complex :)
Am I getting warmer?
Yeah, probably :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a look through NIkic's printer; the section dealing with comments spans about 100 lines. As you say, it's vaguely similar to your code. I figure I'm looking at about a week or more of debugging and trying to understand what it means to add/remove shape items, and work out all the kinks. If the printer functionality is going to be a blocker on this PR then I guess I'll have no choice, but given that the tests pass as-is, is there any way we can leave that stuff for another time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's a blocker, otherwise it'd mean I have to implement it myself after merging the PR without it.
This is a serious low-level library, I'd expect an issue here or in Slevomat CS pretty soon after people would start commenting their PHPDoc types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very well. I'll dive in on it, thanks much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made some progress. Still not sure I understand what I'm doing, but I've added a new printer test that uses the visitor pattern to add a few ArrayShapeItems
with comments to a multiline array shape, and got the DiffElem::TYPE_ADD
branch of Printer::printArrayFormatPreserving
working, at least with single line comments. Whew.
Tomorrow I'll see what I can do about multi-line comments, but in the meantime can you shed any light on these CI errors?
[BC] ADDED: Method getComments() was added to interface PHPStan\PhpDocParser\Ast\Node
[BC] ADDED: Method getComments() was added to interface PHPStan\PhpDocParser\Ast\Type\TypeNode
[BC] ADDED: Method getComments() was added to interface PHPStan\PhpDocParser\Ast\PhpDoc\PhpDocTagValueNode
[BC] ADDED: Method getComments() was added to interface PHPStan\PhpDocParser\Ast\PhpDoc\PhpDocChildNode
[BC] ADDED: Method getComments() was added to interface PHPStan\PhpDocParser\Ast\ConstExpr\ConstExprNode
Thanks much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't add methods on interfaces, that's a BC break in case someone implements their own Node.
You always need to access the comments through getAttribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thank you. Updated.
I'm having some trouble negotiating with your CI build. I'm getting this error:
So the tests are failing, but the reported error is some kind of slevomat thing? 😕 |
This is the workflow that fails in your PR: https://github.com/phpstan/phpdoc-parser/blob/1.23.x/.github/workflows/test-slevomat-coding-standard.yml Slevomat CS is a heavy user of phpdoc-parser so we're testing every phpdoc-parser commit against Slevomat CS, to see that we didn't break anything. The test in question: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert the commented-out tests, they shouldn't change or fail.
src/Ast/Comment.php
Outdated
{ | ||
|
||
/** @var string */ | ||
protected $text; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/Ast/Attribute.php
Outdated
@@ -13,4 +13,6 @@ final class Attribute | |||
|
|||
public const ORIGINAL_NODE = 'originalNode'; | |||
|
|||
public const COMMENT = 'comment'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be COMMENTS
/ comments
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thank you 829b5a0
Oh, that's clever, so you copy the current source of phpdoc-parser into slevomat's own vendor/phpstan folder, then you see what happens, eh? And because I'm now supporting comments within doc blocks, some test it runs related to "forbidden annotations" is getting throw off? Any idea how to proceed, here? |
An, yes, I had the tests commented out while I was experimenting with the node-based comment approach. Restored, thank you. |
You should probably replicate what the workflow does locally, which will allow you to debug what's going wrong in the test. It might actually be valid, and we can just update the assertion in Slevomat CS after merging and releasing this PR. But it needs to be investigated. |
Great idea, thanks. |
Heh, well hurray for debugging. It turns out that the comment-parsing I was doing was getting thrown off by this kind of thing:
I've modified my regex to include a negative lookbehind on ":", and that seemed to do the trick. Thanks much for the insights. I guess that just leaves the format-preserving stuff, which I will look at this evening. |
TBH I'd first make single line comments work perfectly,and only if there is a real demand add multiline comments. We're already inside a comment, we don't need our comments to support multiline comments 😀 |
Fair point, but it was actually fairly easy to get working as all the hard work happens in |
With my latest changes, multiline comments seem to work with format-preserving printer: /**
* @return array{
* c: string,
* /* This is a
* multiline
* comment
* */
* d: string,
* /* Another
* * multiline
* * comment with leading asterisks
* */
* e: string
* }
*/ I also have the I think that just leaves the |
Oh, I also reverted the trailing comment style: /**
@return array{
a: int // comment
} Nikic parser doesn't seem to support it, and I suppose it could arguably be considered ambiguous (does it go with the item to the left, or the item on the following line?), so I figure it's best to leave it for another day. |
Took a shot at implementing delayed add. At this point I think I'm ready for a round of reviews. I'm sure there are going to be a lot of use cases I missed, so let me know if you have any suggestions on how to improve the tests. |
src/Lexer/Lexer.php
Outdated
@@ -160,6 +163,7 @@ private function generateRegexp(): string | |||
self::TOKEN_CLOSE_CURLY_BRACKET => '\\}', | |||
|
|||
self::TOKEN_COMMA => ',', | |||
self::TOKEN_COMMENT => '((?<![:/])\/\/[^\n]*|\/\*(?!\*)[^*]*\*+([^\/][^*]*\*+)*\/)', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How exactly this huge regex works? Is there a danger of it consuming more content than it should?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explanation:
-
((?<![:/])\/\/[^\n]*|\/\*[\s\S]*?\*\/)
: This entire regular expression is composed of two main alternatives separated by a pipe |. -
First alternative:
(?<![:/])\/\/[^\n]*
(?<![:/])
: Negative lookbehind to ensure that//
is not preceded by:
or/
, thus avoiding URLs.\/\/
: Matches the start of a single-line comment//
.[^\n]*
: Matches any character except a newline\n
, effectively matching the rest of the comment line until the end.
- Second alternative:
\/\*[\s\S]*?\*\/
\/\*
: Matches the start of a multi-line comment/*
.[\s\S]*?
: Non-greedy match for any character, including newlines, zero or more times. This matches the content of the multi-line comment.\*\/
: Matches the end of a multi-line comment*/
.
Is there a danger of it consuming more content than it should?
I don't think so, but I guess we won't know for sure until it happens. I'm available to support if it comes up.
src/Parser/TypeParser.php
Outdated
$tokens->currentTokenLine(), | ||
$tokens->currentTokenIndex() | ||
); | ||
$tokens->next(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if there's one comment after another? /* c *//* d*/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair enough: 5d9be65
Although I don't really know what it means, or how this all gets consumed by other libraries.
src/Parser/TypeParser.php
Outdated
/** | ||
* @param list<Ast\Comment> $comments | ||
*/ | ||
public function appendComments(Ast\Node $type, array $comments = []): Ast\Node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- should be private
- I dislike optional parameters, make it required
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely. cee33d9
src/Parser/TypeParser.php
Outdated
continue; | ||
} | ||
|
||
$this->appendComments($item, [new Ast\Comment($tokens->currentTokenValue(), $tokens->currentTokenLine(), $tokens->currentTokenIndex())]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a pity that this PR adds comments only specifically to array shape items. It should be achieved somehow generally so that all types that support multiline syntax can support comments automatically.
These types come to mind:
- Object shapes (very similar but there are different methods for them in TypeParser)
- Conditional types
- Union and intersection types
- Generic types
- Callables
Again, it should be possible to achieve this automatically without modifying parsing each of these types which can be very error-prone to implement.
I see that you added reading a comment in enrichWithAttributes
. What happens, what kind of syntax support we lose, if we just leave the modification in enrichWithAttributes, and remove every change about parsing array shapes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can try, but I think it's complicated by architectural differences between this library and PHP-Parser
:
PHP-Parser
has a central getNextToken
hook in Lexer
, and everything must go through it to get tokens. Comments are plucked from the token list and appended to an attributes structure that is pushed onto a stack, and used to decorate nodes later.
phpdoc-parser
requires us to manually call enrichWithAttributes
over and over.
Anyway, let me poke around a bit more. Maybe we can do something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and remove every change about parsing array shapes?
I don't think we can really do that part of it. I just tried it, and the parser travels from parseArrayShape
to parseArrayShapeItem
, and then finally crashes in parseArrayShapeKey
. I could maybe adjust parseArrayShapeKey
to not die on comments, but by that point we've already gone wrong, because if I understand correctly we want the comment to be associated with the Ast\Type\ArrayShapeItemNode
node, and not the key node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not against rearchitecting this library (in a separate PR) to make our job easier here in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that could help with general comment parsing, but for the particular case of array shape items I think we're going to have to continue special-casing (because we want to pair comments specifically with items, and not keys or values).
But if you want to start thinking about how to remove the need for all those calls to enrichWithAttributes
it could help with general comment parsing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
intersection types
I tried to get intersections working with comments, but multiline syntax doesn't seem to work; this doesn't work with or without comments:
/**
* @return
* Foo
* &
* Bar
* /
I can wok on that if you really want it, but otherwise I'd be much happier to leave it for another time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conditional types
This also won't work with comments and format-preserving printer in its current form, because it doesn't have a single items
property of array type, so it will either need to be refactored to work more like ArrayShapeType
and ObjectShapeType
, or I will need to do special handling in the printer. I'm always interested in the challenge, but would love to save this for another day if you don't mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Union
This one is nearly doable, but the code in Nikic parser seems to forbid it:
// We go multiline if the original code was multiline,
// or if it's an array item with a comment above it.
if ($insertStr === ', ' &&
($this->isMultiline($origNodes) || $arrItem->getComments())
) {
$insertStr = ',';
$insertNewline = true;
}
So, nikic parser only prints comments when it's a newline, and we only decide to do a new line when there are comments, or the insert string is ,
, but for union the insert string is |
. I can tweak the logic if you want, but I think it will make us diverge from nikic parser.
At this point, I think I have done just about everything I can. Please advise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these both annotating a: string with the same comment?
Finally, I did more debugging with Nikic parser, and no, it seems that a comment that comes after an array item is not paired with the item that precedes it; it is associated with the following line:
$items = [
'a' => 'foo', // puppies are cute
'b' => 'bar'
];
In this example, the "puppies" comment is bound to the 'b' => 'bar'
line.
5d56c30
to
cc41c61
Compare
I've got an idea - every |
Sure, can try something like that, but that's one step removed from just doing whatever you're going to do directly in
|
Moving this back to draft, as I'm discovering general problems around format-preserving of single line vs inline comments. Hopefully won't take too long to sort out... |
False alarm. I confirmed behavior against Nikic parser, and the behavior with comments in this branch seems to be correct. I had the idea that inline comments should stay inline when format-preserving printing, but apparently not; all comment types force a newline after a comment, and then another after the item it is paired with. |
About |
src/Parser/TokenIterator.php
Outdated
|
||
/** @phpstan-impure */ | ||
public function tryConsumeTokenType(int $tokenType): bool | ||
public function tryConsumeTokenType(int $tokenType, bool $consumeAll = false): bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't like bool
arguments and I really don't like optional arguments. Please make the scenario where you'd call this with true
as a separate method with a different name. Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, let me revisit this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better? 1c3f514
I'll do whatever you want to move things forward, but the sample we're discussing here doesn't really have an impact on the array-based stuff I've been working on (I've confirmed over and over that those are working perfectly in alignment with nikic parser, for both single-line and block comments). I'm not sure what you expect to happen for edit: to clarify, such a comment does appear in the printer output, but not if you "edit" the identifier node in this use case to include a new comment. I'm having trouble finding an equivalent sample in the nikic parser tests to analyze, but its format-preserving printer has a few branches that yours doesn't (eg. |
So please remove that from the parser along with its complexity and leave only the trailing |
Well, that's certainly a fair point! I don't think it will reduce complexity at all, but obviously if it breaks PHP it will have to go. Let me see what I can do... |
Here you go: bfa5991 |
Sure: 3b6ff9a |
/** | ||
* @return Comment[] | ||
*/ | ||
public function flushComments(): array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this PR is starting to be in a pretty good shape. I still have to go through the many test cases and think about it very hard (and try to break your logic :))
Every time there's a new stateful logic introduced like here with the comments array, I try to think how we can prevent it being used wrongly.
I think TokenIterator should throw an exception if the comments weren't flushed (the array is not empty):
- Definitely when consuming TOKEN_END
- Maybe even more often, basically every time we assume the array should be empty. Like when consuming some other token than EOL and maybe others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having a look through the code, too, but so far I don't actually see any calls to consumeTokenType
with TOKEN_END
as an argument.
My next thought was to do a check before returning from TypeParser::parse
, but that one is recursive, so TypeParser
doesn't really know when it's done (without some further refactoring).
PhpDocParser::parse
has some potential, but while it is possible to have comments remaining when the parsing is done, it's not clear to me if it's an error. For example, this seems like it should be valid to me:
/**
* @return int|false
*
* // can return false on failure
* /
Since we always attach each comment to the following node, and there are no more nodes after the comment, we technically have a leftover comment, but no cause to throw an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, so this comment made me hear a loud sound of breaking tableware :D
We need to make sure that comments outside of types, outside of TypeParser, are still parsed as they are today!
So for example:
/** @param int $a // this is a description */
Must become a description of ParamTagValueNode. Please be aware that there are two differents implementations of consuming descriptions: parseOptionalDescription
and parseOptionalDescriptionAfterDoctrineTag
. So make sure to update and test both. Thank you.
In this case:
/**
* @return int|false
*
* // can return false on failure
* /
This must become a standalone PhpDocTextNode as today.
Can you make sure they still do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make sure they still do?
Did they ever? I didn't know about parseOptionalDescription
or parseOptionalDescriptionAfterDoctrineTag
, but I checked out the 1.23.x
branch, stuck a breakpoint in each of them, and ran the tests. Neither breakpoint was hit.
edit: disregard. Somehow I wound up inside the build-cs
folder.
Can you provide a breaking test for me? Feel free to push a commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/** @param int $a // this is a description */
Must become a description of ParamTagValueNode
By the way, I'm not sure about that one. None of the samples I see in the tests look like that. They all look like this:
/**
@param int $a this is a description
*/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise we're already inside a comment. There's no structural difference between @param int $foo the number of foos to count and @param int $foo // TODO: come up with description.
There is a functional difference. Let's rewrite it a different way:
// with description
$tag = [
'name' => '@param',
'type' => `int`,
'parameterName' => '$foo,
'description` => 'the number of foos to count'
];
// with comment
$tag = [
'name' => '@param',
'type' => `int`,
'parameterName' => '$foo,
'description` => '' // TODO: come up with a description
];
Is that more clear? The first one has a description, the second one doesn't.
// outside of PHPDoc types must mean the same thing as before. If not for nothing else then at least for backward compatibility.
Can you elaborate on this? I'm not clear on what you feel is broken. Comments are currently ignored, which is what is supposed to happen. How exactly have I broken backwards-compatibility? All the existing tests pass, and comments are neatly ignored.
If you want to add the kind of complexity you have in mind later that's your business, but for now do we really have a problem?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About backward compatibility - I've described the current behaviour which we need to keep in these tests: 0bb2fe4
AFAIK they are failing in your PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How exactly have I broken backwards-compatibility?
Ah, maybe I understand. You're saying that there is code currently floating around out in the wild where there are already comments assigned to descriptions and being interpreted as descriptions, and your concern is that after this change they will lose their description status, is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can't change the AST like that. It'd also make altering these comments more difficult and diferent for the users of format-preserving printer, most notably Slevomat CS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. At this point I am going to abandon this PR. I tried. I feel that the behavior in this PR is correct. I understand it may "break" descriptions out in the wild, but I would argue that the code out in the wild is incorrect, and this fixes it. That happens sometimes in the software development cycle, and the solution is to do a major version release, not add tons of complexity trying to achieve backwards compatibility with existing behavior that is wrong.
I'm going to use my own fork, which I really hate to do, but I've done everything I can within reason to solve this problem rationally, and I'm not getting support. Good luck.
use older syntax linting + stanning revert missing file remove typehint lint stan stan comments as attributes revert some stuff revert some stuff missing file styles fix make public, change const name restore tests undo multiline ignore single lines comments preceded by : add comments to arrayshapeitems, not arrayshape whoops fix various things support comment after comma add test fix order remove unused remove unused restore start on format-preserving printer comment support remove addComments support multiline comments in add flow remove comments to the right stuff support changing of comments support editing of comments delayed add fix tests remove comments handle multiple comments clean up lint lint add nowdoc stanning linting, stanning add visibility use nowdoc in a few more places set up test for adding comment to object lint remove debug thing gather comments in TokenIterator move comment handling to TokenIterator use nowdoc add consumeAll flag add more tests for array shape comments add test file for object Allow asserting the type of `$this` Update actions/checkout action to v4 Make the CI job pass after upgrading PHPStan to v1.10.34 simplify/unify parseGeneric method fix/unify callable template parsing with EOL Allow conditional type in callable return type fix template Revert "fix template" This reverts commit 655d968. restore baseline add tests for comments on callable introduce tryConsumeTokenTypeAll single-line comments only simplify getReformattedText lint linting lint
8012067
to
4dd3d9a
Compare
{ | ||
yield [ | ||
'Comment after @param', | ||
'/** @param int $a // this is a description */', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not a description, it is a commented description, meaning that as far as the parser is concerned there is no description here.
'Comment on a separate line', | ||
'/**' . PHP_EOL . | ||
' * @param int $a' . PHP_EOL . | ||
' * // this is a comment' . PHP_EOL . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a text node. It is a commented text node., meaning that there is no tag or descriptive content here.
I'm sorry that you feel this way and that you're not going to finish this PR. But I need to do what I feel is right for this project and its users, and also to keep the support burden low and maintainability high. |
Sigh. I forgot I can't just fork the parser, because you have everything all wrapped up in a phar. |
@shmax, multiline comments would be really great feature. We are grateful to you for all the efforts. |
@ethernidee thanks, I appreciate the words of encouragement. Let me summarize the situation:
I can't spent the rest of my life trying to hit a moving target, so I decided to just document my own code the old-fashioned way (mkdocs), but I may return to this at some point if I get bored. In the meantime, feel free to take this code and open your own PR. |
To my mind, multiline comments are necessary everywhere. Array shape, object shape, regular tags. It's normal practice to be able to describe something using ascii-graphics, a few sentences or some kind of a list. The same way, as regular comments work. |
Unfortunately, we've discovered that multiline comments aren't legal inside a doc block comment (PHP can't parse them). |
This is my first experiment in trying to support comments in array shape:
Note: as pointed out by @ondrejmirtes , multi-line comments within a block comment aren't valid PHP, so single line comments like this are the best we can do.