Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support comments in array shape #213

Open
wants to merge 2 commits into
base: 1.23.x
Choose a base branch
from

Conversation

shmax
Copy link
Contributor

@shmax shmax commented Sep 17, 2023

This is my first experiment in trying to support comments in array shape:

array{
    // a is for apple
	a: int,
}

array{
    // a is for apple
    // a is also for aardvark
	a: int,
}

Note: as pointed out by @ondrejmirtes , multi-line comments within a block comment aren't valid PHP, so single line comments like this are the best we can do.

@@ -13,7 +14,7 @@ class ArrayShapeNode implements TypeNode

use NodeAttributes;

/** @var ArrayShapeItemNode[] */
/** @var array<ArrayShapeItemNode|CommentNode|IdentifierTypeNode> */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think comments should be part of the AST as nodes. Being in node attributes would match how nikic/PHP-Parser works, and I would like that here too. It would make it easier to work with it, and also support comments everywhere, without BC breaks.

Copy link
Contributor Author

@shmax shmax Sep 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, but can you please provide more of a big picture to me, here? ie:

  1. why do we need to retain them at all?
  2. does the printer need to reproduce them?
  3. if so, what about new lines? Single line comments can depend on new lines to prevent them from affecting following content

And a little background about the printer would be helpful, because if that's what's producing the output of dumpType and what we see in error messages, then I imagine you would still want single-line output, in which case comments are just going to add noise, and if we agree on that then it's not clear to me why we shouldn't just ignore comments.

Copy link
Contributor Author

@shmax shmax Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took another crack at it:

  • replaced simple // regex with a more robust pattern that captures single and block comments (so they can happen anywhere, as you say)
  • modified parseArrayShape to catch comments. This seems to be necessary because array shape is one of the few node types that does recursive parsing (I had a look at nikic parser, by the way, and they have an architecture that works better for this kind of thing, with all the work of comment-capturing happening in a centralized next method that drives the entire process).

Am I getting warmer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to retain them at all?

It might be useful for the users of the parser to read them. For the same reason nikic/PHP-Parser retains them. And we might need it for the format-preserving printer.

does the printer need to reproduce them?

You can write some tests for this in PrinterTest. If we remove an array shape item, its comment should also be removed. If we change something in an array shape item, its comment should be preserved.

You can also try to write a PrinterTest test case that removes/adds comments to AST nodes.

As for the Printer implementation, you can see the similarities between:

  • https://github.com/nikic/PHP-Parser/blob/f4961b89ac0289455295e2dc2169527098a6e11b/lib/PhpParser/PrettyPrinterAbstract.php#L751-L998 (and also the pArray method)
  • private function printArrayFormatPreserving(array $nodes, array $originalNodes, TokenIterator $originalTokens, int &$tokenIndex, string $parentNodeClass, string $subNodeName): ?string
    {
    $diff = $this->differ->diffWithReplacements($originalNodes, $nodes);
    $mapKey = $parentNodeClass . '->' . $subNodeName;
    $insertStr = $this->listInsertionMap[$mapKey] ?? null;
    $result = '';
    $beforeFirstKeepOrReplace = true;
    $delayedAdd = [];
    $insertNewline = false;
    [$isMultiline, $beforeAsteriskIndent, $afterAsteriskIndent] = $this->isMultiline($tokenIndex, $originalNodes, $originalTokens);
    if ($insertStr === "\n * ") {
    $insertStr = sprintf('%s%s*%s', $originalTokens->getDetectedNewline() ?? "\n", $beforeAsteriskIndent, $afterAsteriskIndent);
    }
    foreach ($diff as $i => $diffElem) {
    $diffType = $diffElem->type;
    $newNode = $diffElem->new;
    $originalNode = $diffElem->old;
    if ($diffType === DiffElem::TYPE_KEEP || $diffType === DiffElem::TYPE_REPLACE) {
    $beforeFirstKeepOrReplace = false;
    if (!$newNode instanceof Node || !$originalNode instanceof Node) {
    return null;
    }
    $itemStartPos = $originalNode->getAttribute(Attribute::START_INDEX);
    $itemEndPos = $originalNode->getAttribute(Attribute::END_INDEX);
    if ($itemStartPos < 0 || $itemEndPos < 0 || $itemStartPos < $tokenIndex) {
    throw new LogicException();
    }
    $result .= $originalTokens->getContentBetween($tokenIndex, $itemStartPos);
    if (count($delayedAdd) > 0) {
    foreach ($delayedAdd as $delayedAddNode) {
    $parenthesesNeeded = isset($this->parenthesesListMap[$mapKey])
    && in_array(get_class($delayedAddNode), $this->parenthesesListMap[$mapKey], true);
    if ($parenthesesNeeded) {
    $result .= '(';
    }
    $result .= $this->printNodeFormatPreserving($delayedAddNode, $originalTokens);
    if ($parenthesesNeeded) {
    $result .= ')';
    }
    if ($insertNewline) {
    $result .= $insertStr . sprintf('%s%s*%s', $originalTokens->getDetectedNewline() ?? "\n", $beforeAsteriskIndent, $afterAsteriskIndent);
    } else {
    $result .= $insertStr;
    }
    }
    $delayedAdd = [];
    }
    $parenthesesNeeded = isset($this->parenthesesListMap[$mapKey])
    && in_array(get_class($newNode), $this->parenthesesListMap[$mapKey], true)
    && !in_array(get_class($originalNode), $this->parenthesesListMap[$mapKey], true);
    $addParentheses = $parenthesesNeeded && !$originalTokens->hasParentheses($itemStartPos, $itemEndPos);
    if ($addParentheses) {
    $result .= '(';
    }
    $result .= $this->printNodeFormatPreserving($newNode, $originalTokens);
    if ($addParentheses) {
    $result .= ')';
    }
    $tokenIndex = $itemEndPos + 1;
    } elseif ($diffType === DiffElem::TYPE_ADD) {
    if ($insertStr === null) {
    return null;
    }
    if (!$newNode instanceof Node) {
    return null;
    }
    if ($insertStr === ', ' && $isMultiline) {
    $insertStr = ',';
    $insertNewline = true;
    }
    if ($beforeFirstKeepOrReplace) {
    // Will be inserted at the next "replace" or "keep" element
    $delayedAdd[] = $newNode;
    continue;
    }
    $itemEndPos = $tokenIndex - 1;
    if ($insertNewline) {
    $result .= $insertStr . sprintf('%s%s*%s', $originalTokens->getDetectedNewline() ?? "\n", $beforeAsteriskIndent, $afterAsteriskIndent);
    } else {
    $result .= $insertStr;
    }
    $parenthesesNeeded = isset($this->parenthesesListMap[$mapKey])
    && in_array(get_class($newNode), $this->parenthesesListMap[$mapKey], true);
    if ($parenthesesNeeded) {
    $result .= '(';
    }
    $result .= $this->printNodeFormatPreserving($newNode, $originalTokens);
    if ($parenthesesNeeded) {
    $result .= ')';
    }
    $tokenIndex = $itemEndPos + 1;
    } elseif ($diffType === DiffElem::TYPE_REMOVE) {
    if (!$originalNode instanceof Node) {
    return null;
    }
    $itemStartPos = $originalNode->getAttribute(Attribute::START_INDEX);
    $itemEndPos = $originalNode->getAttribute(Attribute::END_INDEX);
    if ($itemStartPos < 0 || $itemEndPos < 0) {
    throw new LogicException();
    }
    if ($i === 0) {
    // If we're removing from the start, keep the tokens before the node and drop those after it,
    // instead of the other way around.
    $originalTokensArray = $originalTokens->getTokens();
    for ($j = $tokenIndex; $j < $itemStartPos; $j++) {
    if ($originalTokensArray[$j][Lexer::TYPE_OFFSET] === Lexer::TOKEN_PHPDOC_EOL) {
    break;
    }
    $result .= $originalTokensArray[$j][Lexer::VALUE_OFFSET];
    }
    }
    $tokenIndex = $itemEndPos + 1;
    }
    }
    if (count($delayedAdd) > 0) {
    if (!isset($this->emptyListInsertionMap[$mapKey])) {
    return null;
    }
    [$findToken, $extraLeft, $extraRight] = $this->emptyListInsertionMap[$mapKey];
    if ($findToken !== null) {
    $originalTokensArray = $originalTokens->getTokens();
    for (; $tokenIndex < count($originalTokensArray); $tokenIndex++) {
    $result .= $originalTokensArray[$tokenIndex][Lexer::VALUE_OFFSET];
    if ($originalTokensArray[$tokenIndex][Lexer::VALUE_OFFSET] !== $findToken) {
    continue;
    }
    $tokenIndex++;
    break;
    }
    }
    $first = true;
    $result .= $extraLeft;
    foreach ($delayedAdd as $delayedAddNode) {
    if (!$first) {
    $result .= $insertStr;
    if ($insertNewline) {
    $result .= sprintf('%s%s*%s', $originalTokens->getDetectedNewline() ?? "\n", $beforeAsteriskIndent, $afterAsteriskIndent);
    }
    }
    $result .= $this->printNodeFormatPreserving($delayedAddNode, $originalTokens);
    $first = false;
    }
    $result .= $extraRight;
    }
    return $result;
    }

But you can also see all the places it touches node's comments there. I skipped these places when implementing the printer here because comments weren't a thing back then. But arguably we'll need to replicate the logic with comments here too, but the needs might not be that complex :)

Am I getting warmer?

Yeah, probably :)

Copy link
Contributor Author

@shmax shmax Sep 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a look through NIkic's printer; the section dealing with comments spans about 100 lines. As you say, it's vaguely similar to your code. I figure I'm looking at about a week or more of debugging and trying to understand what it means to add/remove shape items, and work out all the kinks. If the printer functionality is going to be a blocker on this PR then I guess I'll have no choice, but given that the tests pass as-is, is there any way we can leave that stuff for another time?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's a blocker, otherwise it'd mean I have to implement it myself after merging the PR without it.

This is a serious low-level library, I'd expect an issue here or in Slevomat CS pretty soon after people would start commenting their PHPDoc types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very well. I'll dive in on it, thanks much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some progress. Still not sure I understand what I'm doing, but I've added a new printer test that uses the visitor pattern to add a few ArrayShapeItems with comments to a multiline array shape, and got the DiffElem::TYPE_ADD branch of Printer::printArrayFormatPreserving working, at least with single line comments. Whew.

Tomorrow I'll see what I can do about multi-line comments, but in the meantime can you shed any light on these CI errors?

[BC] ADDED: Method getComments() was added to interface PHPStan\PhpDocParser\Ast\Node
[BC] ADDED: Method getComments() was added to interface PHPStan\PhpDocParser\Ast\Type\TypeNode
[BC] ADDED: Method getComments() was added to interface PHPStan\PhpDocParser\Ast\PhpDoc\PhpDocTagValueNode
[BC] ADDED: Method getComments() was added to interface PHPStan\PhpDocParser\Ast\PhpDoc\PhpDocChildNode
[BC] ADDED: Method getComments() was added to interface PHPStan\PhpDocParser\Ast\ConstExpr\ConstExprNode

Thanks much.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't add methods on interfaces, that's a BC break in case someone implements their own Node.

You always need to access the comments through getAttribute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thank you. Updated.

@shmax
Copy link
Contributor Author

shmax commented Sep 19, 2023

I'm having some trouble negotiating with your CI build. I'm getting this error:

Time: 00:03.819, Memory: 80.00 MB

There was 1 PHPUnit test runner deprecation:

1) Your XML configuration validates against a deprecated schema. Migrate your XML configuration using "--migrate-configuration"!

--

There was 1 failure:

1) SlevomatCodingStandard\Sniffs\Commenting\ForbiddenAnnotationsSniffTest::testForbiddenAnnotations
Failed asserting that 9 is identical to 10.

/home/runner/work/phpdoc-parser/phpdoc-parser/slevomat-cs/tests/Sniffs/Commenting/ForbiddenAnnotationsSniffTest.php:24

FAILURES!
Tests: 718, Assertions: 2648, Failures: 1, Deprecations: 1.

So the tests are failing, but the reported error is some kind of slevomat thing? 😕

@ondrejmirtes
Copy link
Member

This is the workflow that fails in your PR: https://github.com/phpstan/phpdoc-parser/blob/1.23.x/.github/workflows/test-slevomat-coding-standard.yml

Slevomat CS is a heavy user of phpdoc-parser so we're testing every phpdoc-parser commit against Slevomat CS, to see that we didn't break anything.

The test in question:

Copy link
Member

@ondrejmirtes ondrejmirtes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert the commented-out tests, they shouldn't change or fail.

{

/** @var string */
protected $text;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be public.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -13,4 +13,6 @@ final class Attribute

public const ORIGINAL_NODE = 'originalNode';

public const COMMENT = 'comment';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be COMMENTS / comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thank you 829b5a0

@shmax
Copy link
Contributor Author

shmax commented Sep 19, 2023

Slevomat CS is a heavy user of phpdoc-parser so we're testing every phpdoc-parser commit against Slevomat CS, to see that we didn't break anything.

Oh, that's clever, so you copy the current source of phpdoc-parser into slevomat's own vendor/phpstan folder, then you see what happens, eh? And because I'm now supporting comments within doc blocks, some test it runs related to "forbidden annotations" is getting throw off?

Any idea how to proceed, here?

@shmax
Copy link
Contributor Author

shmax commented Sep 19, 2023

An, yes, I had the tests commented out while I was experimenting with the node-based comment approach. Restored, thank you.

@ondrejmirtes
Copy link
Member

Any idea how to proceed, here?

You should probably replicate what the workflow does locally, which will allow you to debug what's going wrong in the test. It might actually be valid, and we can just update the assertion in Slevomat CS after merging and releasing this PR. But it needs to be investigated.

@shmax
Copy link
Contributor Author

shmax commented Sep 19, 2023

You should probably replicate what the workflow does locally, which will allow you to debug what's going wrong in the test.

Great idea, thanks.

@shmax
Copy link
Contributor Author

shmax commented Sep 19, 2023

Heh, well hurray for debugging. It turns out that the comment-parsing I was doing was getting thrown off by this kind of thing:

* @see https://www.slevomat.cz

I've modified my regex to include a negative lookbehind on ":", and that seemed to do the trick. Thanks much for the insights.

I guess that just leaves the format-preserving stuff, which I will look at this evening.

@shmax shmax marked this pull request as ready for review September 20, 2023 07:38
@ondrejmirtes
Copy link
Member

TBH I'd first make single line comments work perfectly,and only if there is a real demand add multiline comments. We're already inside a comment, we don't need our comments to support multiline comments 😀

@shmax
Copy link
Contributor Author

shmax commented Sep 26, 2023

Fair point, but it was actually fairly easy to get working as all the hard work happens in Comment::getReformattedText, which I was able to lift unchanged from nickic. We'll see, though...

@shmax
Copy link
Contributor Author

shmax commented Sep 26, 2023

With my latest changes, multiline comments seem to work with format-preserving printer:

/**
 * @return array{
 *	c: string,
 *  /* This is a
 *     multiline
 *     comment
 *   */
 *  d: string,
 *  /* Another
 *   * multiline
 *   * comment with leading asterisks
 *   */
 *  e: string
 * }
 */

I also have the DiffElem::TYPE_ADD branch of Printer::printArrayFormatPreserving working.

I think that just leaves the $delayedAdd processing (which I haven't gotten as far as understanding, yet), and possibly removal of nodes, but I didn't see any removal stuff on array shapes already happening in the printer tests, so let me know if you want it.

@shmax
Copy link
Contributor Author

shmax commented Sep 26, 2023

Oh, I also reverted the trailing comment style:

/** 
 @return array{
    a: int // comment
  }

Nikic parser doesn't seem to support it, and I suppose it could arguably be considered ambiguous (does it go with the item to the left, or the item on the following line?), so I figure it's best to leave it for another day.

@shmax
Copy link
Contributor Author

shmax commented Sep 26, 2023

Took a shot at implementing delayed add. At this point I think I'm ready for a round of reviews. I'm sure there are going to be a lot of use cases I missed, so let me know if you have any suggestions on how to improve the tests.

@shmax shmax requested a review from ondrejmirtes September 27, 2023 00:43
@@ -160,6 +163,7 @@ private function generateRegexp(): string
self::TOKEN_CLOSE_CURLY_BRACKET => '\\}',

self::TOKEN_COMMA => ',',
self::TOKEN_COMMENT => '((?<![:/])\/\/[^\n]*|\/\*(?!\*)[^*]*\*+([^\/][^*]*\*+)*\/)',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How exactly this huge regex works? Is there a danger of it consuming more content than it should?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explanation:

  1. ((?<![:/])\/\/[^\n]*|\/\*[\s\S]*?\*\/): This entire regular expression is composed of two main alternatives separated by a pipe |.

  2. First alternative: (?<![:/])\/\/[^\n]*

  • (?<![:/]): Negative lookbehind to ensure that // is not preceded by : or /, thus avoiding URLs.
  • \/\/: Matches the start of a single-line comment //.
  • [^\n]*: Matches any character except a newline \n, effectively matching the rest of the comment line until the end.
  1. Second alternative: \/\*[\s\S]*?\*\/
  • \/\*: Matches the start of a multi-line comment /*.
  • [\s\S]*?: Non-greedy match for any character, including newlines, zero or more times. This matches the content of the multi-line comment.
  • \*\/: Matches the end of a multi-line comment */.

Is there a danger of it consuming more content than it should?

I don't think so, but I guess we won't know for sure until it happens. I'm available to support if it comes up.

$tokens->currentTokenLine(),
$tokens->currentTokenIndex()
);
$tokens->next();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if there's one comment after another? /* c *//* d*/

Copy link
Contributor Author

@shmax shmax Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough: 5d9be65

Although I don't really know what it means, or how this all gets consumed by other libraries.

/**
* @param list<Ast\Comment> $comments
*/
public function appendComments(Ast\Node $type, array $comments = []): Ast\Node
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. should be private
  2. I dislike optional parameters, make it required

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely. cee33d9

continue;
}

$this->appendComments($item, [new Ast\Comment($tokens->currentTokenValue(), $tokens->currentTokenLine(), $tokens->currentTokenIndex())]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a pity that this PR adds comments only specifically to array shape items. It should be achieved somehow generally so that all types that support multiline syntax can support comments automatically.

These types come to mind:

  1. Object shapes (very similar but there are different methods for them in TypeParser)
  2. Conditional types
  3. Union and intersection types
  4. Generic types
  5. Callables

Again, it should be possible to achieve this automatically without modifying parsing each of these types which can be very error-prone to implement.

I see that you added reading a comment in enrichWithAttributes. What happens, what kind of syntax support we lose, if we just leave the modification in enrichWithAttributes, and remove every change about parsing array shapes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can try, but I think it's complicated by architectural differences between this library and PHP-Parser:

PHP-Parser has a central getNextToken hook in Lexer, and everything must go through it to get tokens. Comments are plucked from the token list and appended to an attributes structure that is pushed onto a stack, and used to decorate nodes later.

phpdoc-parser requires us to manually call enrichWithAttributes over and over.

Anyway, let me poke around a bit more. Maybe we can do something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and remove every change about parsing array shapes?

I don't think we can really do that part of it. I just tried it, and the parser travels from parseArrayShape to parseArrayShapeItem, and then finally crashes in parseArrayShapeKey. I could maybe adjust parseArrayShapeKey to not die on comments, but by that point we've already gone wrong, because if I understand correctly we want the comment to be associated with the Ast\Type\ArrayShapeItemNode node, and not the key node.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not against rearchitecting this library (in a separate PR) to make our job easier here in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that could help with general comment parsing, but for the particular case of array shape items I think we're going to have to continue special-casing (because we want to pair comments specifically with items, and not keys or values).

But if you want to start thinking about how to remove the need for all those calls to enrichWithAttributes it could help with general comment parsing.

Copy link
Contributor Author

@shmax shmax Oct 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intersection types

I tried to get intersections working with comments, but multiline syntax doesn't seem to work; this doesn't work with or without comments:

/**
 * @return 
 *    Foo
 *      &
 *    Bar
 * /

I can wok on that if you really want it, but otherwise I'd be much happier to leave it for another time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conditional types

This also won't work with comments and format-preserving printer in its current form, because it doesn't have a single items property of array type, so it will either need to be refactored to work more like ArrayShapeType and ObjectShapeType, or I will need to do special handling in the printer. I'm always interested in the challenge, but would love to save this for another day if you don't mind.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Callables

This one (nearly--had to eat extra lines) works out of the box:

b00d20a

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Union

This one is nearly doable, but the code in Nikic parser seems to forbid it:

// We go multiline if the original code was multiline,
                // or if it's an array item with a comment above it.
                if ($insertStr === ', ' &&
                    ($this->isMultiline($origNodes) || $arrItem->getComments())
                ) {
                    $insertStr = ',';
                    $insertNewline = true;
                }

So, nikic parser only prints comments when it's a newline, and we only decide to do a new line when there are comments, or the insert string is ,, but for union the insert string is |. I can tweak the logic if you want, but I think it will make us diverge from nikic parser.

At this point, I think I have done just about everything I can. Please advise.

Copy link
Contributor Author

@shmax shmax Oct 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these both annotating a: string with the same comment?

Finally, I did more debugging with Nikic parser, and no, it seems that a comment that comes after an array item is not paired with the item that precedes it; it is associated with the following line:

$items = [
    'a' => 'foo',  // puppies are cute
    'b' => 'bar'
];

In this example, the "puppies" comment is bound to the 'b' => 'bar' line.

@shmax shmax force-pushed the array-shape-comments branch from 5d56c30 to cc41c61 Compare September 28, 2023 18:39
@ondrejmirtes
Copy link
Member

I've got an idea - every $tokens->tryConsumeTokenType(Lexer::TOKEN_PHPDOC_EOL); in TypeParser should be replaced by a special method on TokenIterator that would return an array of comments which we could use in enrichWithAttributes. Because comments only make sense at the end of the line, it makes sense to pair it with consuming EOL.

@shmax
Copy link
Contributor Author

shmax commented Sep 29, 2023

I've got an idea - every $tokens->tryConsumeTokenType(Lexer::TOKEN_PHPDOC_EOL); in TypeParser should be replaced by a special method on TokenIterator that would return an array of comments which we could use in enrichWithAttributes. Because comments only make sense at the end of the line, it makes sense to pair it with consuming EOL.

Sure, can try something like that, but that's one step removed from just doing whatever you're going to do directly in enrichWithAttributes, isn't it? And I'm not sure that a comment is always at the end of the line, for example:

/*****************************
 * @param /* TODO: remove */  array $data
 */

@shmax shmax marked this pull request as draft September 30, 2023 18:03
@shmax
Copy link
Contributor Author

shmax commented Sep 30, 2023

Moving this back to draft, as I'm discovering general problems around format-preserving of single line vs inline comments. Hopefully won't take too long to sort out...

@shmax shmax marked this pull request as ready for review October 1, 2023 20:03
@shmax
Copy link
Contributor Author

shmax commented Oct 1, 2023

general problems around format-preserving of single line

False alarm. I confirmed behavior against Nikic parser, and the behavior with comments in this branch seems to be correct. I had the idea that inline comments should stay inline when format-preserving printing, but apparently not; all comment types force a newline after a comment, and then another after the item it is paired with.

@shmax shmax requested a review from ondrejmirtes October 3, 2023 14:59
@ondrejmirtes
Copy link
Member

About @param /* TODO: remove */ array $data - I don't think we want to support this because the comment is not part of the type, it precedes it. And from this it's also obvious why I want to start by supporting end-line // comment only :)


/** @phpstan-impure */
public function tryConsumeTokenType(int $tokenType): bool
public function tryConsumeTokenType(int $tokenType, bool $consumeAll = false): bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like bool arguments and I really don't like optional arguments. Please make the scenario where you'd call this with true as a separate method with a different name. Thank you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, let me revisit this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better? 1c3f514

@shmax
Copy link
Contributor Author

shmax commented Oct 4, 2023

About @param /* TODO: remove */ array $data - I don't think we want to support this because the comment is not part of the type, it precedes it. And from this it's also obvious why I want to start by supporting end-line // comment only :)

I'll do whatever you want to move things forward, but the sample we're discussing here doesn't really have an impact on the array-based stuff I've been working on (I've confirmed over and over that those are working perfectly in alignment with nikic parser, for both single-line and block comments).

I'm not sure what you expect to happen for /* TODO: remove */ $array sample, but from my testing it does indeed get bound to the following item, in this case the identifier. The comment doesn't break anything, but the format-preserving printer doesn't currently know how to handle these kind of comments. I can continue to investigate that if you like...

edit: to clarify, such a comment does appear in the printer output, but not if you "edit" the identifier node in this use case to include a new comment. I'm having trouble finding an equivalent sample in the nikic parser tests to analyze, but its format-preserving printer has a few branches that yours doesn't (eg. pStmts).

@ondrejmirtes
Copy link
Member

/* ... */ inside /** PHPDoc is not even valid PHP: https://3v4l.org/R7jBd

So please remove that from the parser along with its complexity and leave only the trailing // comment. Thank you.

@shmax
Copy link
Contributor Author

shmax commented Oct 5, 2023

Well, that's certainly a fair point! I don't think it will reduce complexity at all, but obviously if it breaks PHP it will have to go. Let me see what I can do...

@shmax
Copy link
Contributor Author

shmax commented Oct 5, 2023

Here you go: bfa5991

@ondrejmirtes
Copy link
Member

What about all of this?

Screenshot 2023-10-05 at 15 53 42

@shmax
Copy link
Contributor Author

shmax commented Oct 5, 2023

Sure: 3b6ff9a

/**
* @return Comment[]
*/
public function flushComments(): array
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this PR is starting to be in a pretty good shape. I still have to go through the many test cases and think about it very hard (and try to break your logic :))

Every time there's a new stateful logic introduced like here with the comments array, I try to think how we can prevent it being used wrongly.

I think TokenIterator should throw an exception if the comments weren't flushed (the array is not empty):

  1. Definitely when consuming TOKEN_END
  2. Maybe even more often, basically every time we assume the array should be empty. Like when consuming some other token than EOL and maybe others.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having a look through the code, too, but so far I don't actually see any calls to consumeTokenType with TOKEN_END as an argument.

My next thought was to do a check before returning from TypeParser::parse, but that one is recursive, so TypeParser doesn't really know when it's done (without some further refactoring).

PhpDocParser::parse has some potential, but while it is possible to have comments remaining when the parsing is done, it's not clear to me if it's an error. For example, this seems like it should be valid to me:

/**
 *  @return int|false
 *
 * // can return false on failure
 * /

Since we always attach each comment to the following node, and there are no more nodes after the comment, we technically have a leftover comment, but no cause to throw an exception.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, so this comment made me hear a loud sound of breaking tableware :D

We need to make sure that comments outside of types, outside of TypeParser, are still parsed as they are today!

So for example:

/** @param int $a // this is a description */

Must become a description of ParamTagValueNode. Please be aware that there are two differents implementations of consuming descriptions: parseOptionalDescription and parseOptionalDescriptionAfterDoctrineTag. So make sure to update and test both. Thank you.

In this case:

/**
 *  @return int|false
 *
 * // can return false on failure
 * /

This must become a standalone PhpDocTextNode as today.

Can you make sure they still do?

Copy link
Contributor Author

@shmax shmax Oct 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make sure they still do?

Did they ever? I didn't know about parseOptionalDescription or parseOptionalDescriptionAfterDoctrineTag, but I checked out the 1.23.x branch, stuck a breakpoint in each of them, and ran the tests. Neither breakpoint was hit.

edit: disregard. Somehow I wound up inside the build-cs folder.

Can you provide a breaking test for me? Feel free to push a commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/** @param int $a // this is a description */
Must become a description of ParamTagValueNode

By the way, I'm not sure about that one. None of the samples I see in the tests look like that. They all look like this:

/** 
@param int $a this is a description 
*/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise we're already inside a comment. There's no structural difference between @param int $foo the number of foos to count and @param int $foo // TODO: come up with description.

There is a functional difference. Let's rewrite it a different way:

// with description
$tag = [
  'name' => '@param',
  'type' => `int`,
  'parameterName' => '$foo,
 'description` => 'the number of foos to count'
];
// with comment
$tag = [
  'name' => '@param',
  'type' => `int`,
  'parameterName' => '$foo,
 'description` => '' // TODO: come up with a description
];

Is that more clear? The first one has a description, the second one doesn't.

// outside of PHPDoc types must mean the same thing as before. If not for nothing else then at least for backward compatibility.

Can you elaborate on this? I'm not clear on what you feel is broken. Comments are currently ignored, which is what is supposed to happen. How exactly have I broken backwards-compatibility? All the existing tests pass, and comments are neatly ignored.

If you want to add the kind of complexity you have in mind later that's your business, but for now do we really have a problem?!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About backward compatibility - I've described the current behaviour which we need to keep in these tests: 0bb2fe4

AFAIK they are failing in your PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How exactly have I broken backwards-compatibility?

Ah, maybe I understand. You're saying that there is code currently floating around out in the wild where there are already comments assigned to descriptions and being interpreted as descriptions, and your concern is that after this change they will lose their description status, is that right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can't change the AST like that. It'd also make altering these comments more difficult and diferent for the users of format-preserving printer, most notably Slevomat CS.

Copy link
Contributor Author

@shmax shmax Oct 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. At this point I am going to abandon this PR. I tried. I feel that the behavior in this PR is correct. I understand it may "break" descriptions out in the wild, but I would argue that the code out in the wild is incorrect, and this fixes it. That happens sometimes in the software development cycle, and the solution is to do a major version release, not add tons of complexity trying to achieve backwards compatibility with existing behavior that is wrong.

I'm going to use my own fork, which I really hate to do, but I've done everything I can within reason to solve this problem rationally, and I'm not getting support. Good luck.

shmax and others added 2 commits October 9, 2023 08:43
use older syntax

linting + stanning

revert

missing file

remove typehint

lint

stan

stan

comments as attributes

revert some stuff

revert some stuff

missing file

styles fix

make public, change const name

restore tests

undo multiline

ignore single lines comments preceded by :

add comments to arrayshapeitems, not arrayshape

whoops

fix various things

support comment after comma

add test

fix order

remove unused

remove unused

restore

start on format-preserving printer comment support

remove addComments

support multiline comments in add flow

remove comments to the right stuff

support changing of comments

support editing of comments

delayed add

fix tests

remove comments

handle multiple comments

clean up

lint

lint

add nowdoc

stanning

linting, stanning

add visibility

use nowdoc in a few more places

set up test for adding comment to object

lint

remove debug thing

gather comments in TokenIterator

move comment handling to TokenIterator

use nowdoc

add consumeAll flag

add more tests for array shape comments

add test file for object

Allow asserting the type of `$this`

Update actions/checkout action to v4

Make the CI job pass after upgrading PHPStan to v1.10.34

simplify/unify parseGeneric method

fix/unify callable template parsing with EOL

Allow conditional type in callable return type

fix template

Revert "fix template"

This reverts commit 655d968.

restore baseline

add tests for comments on callable

introduce tryConsumeTokenTypeAll

single-line comments only

simplify getReformattedText

lint

linting

lint
@shmax shmax force-pushed the array-shape-comments branch 2 times, most recently from 8012067 to 4dd3d9a Compare October 9, 2023 15:55
{
yield [
'Comment after @param',
'/** @param int $a // this is a description */',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not a description, it is a commented description, meaning that as far as the parser is concerned there is no description here.

'Comment on a separate line',
'/**' . PHP_EOL .
' * @param int $a' . PHP_EOL .
' * // this is a comment' . PHP_EOL .
Copy link
Contributor Author

@shmax shmax Oct 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a text node. It is a commented text node., meaning that there is no tag or descriptive content here.

@ondrejmirtes
Copy link
Member

I'm sorry that you feel this way and that you're not going to finish this PR. But I need to do what I feel is right for this project and its users, and also to keep the support burden low and maintainability high.

@shmax
Copy link
Contributor Author

shmax commented Oct 18, 2023

Sigh. I forgot I can't just fork the parser, because you have everything all wrapped up in a phar.

@shmax shmax reopened this Oct 18, 2023
@ethernidee
Copy link

@shmax, multiline comments would be really great feature. We are grateful to you for all the efforts.

@shmax
Copy link
Contributor Author

shmax commented Oct 24, 2023

@ethernidee thanks, I appreciate the words of encouragement. Let me summarize the situation:

  1. I opened a PR with a very simple solution, which was to ignore array shape comments
  2. @ondrejmirtes decided that if we are going to have comments, then we must do them the same way that nikic parser does them, and that comments must work everywhere, not just in array shape.
  3. I got that working, and now @ondrejmirtes has declared that they should NOT work as one would expect when used in lines with tag names, because that would not be "backwards compatible"

I can't spent the rest of my life trying to hit a moving target, so I decided to just document my own code the old-fashioned way (mkdocs), but I may return to this at some point if I get bored. In the meantime, feel free to take this code and open your own PR.

@ethernidee
Copy link

To my mind, multiline comments are necessary everywhere. Array shape, object shape, regular tags. It's normal practice to be able to describe something using ascii-graphics, a few sentences or some kind of a list. The same way, as regular comments work.

@shmax
Copy link
Contributor Author

shmax commented Oct 25, 2023

multiline comments are necessary everywhere

Unfortunately, we've discovered that multiline comments aren't legal inside a doc block comment (PHP can't parse them).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants