Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formatter: Keep right-hanging comments aligned #7684

Open
warsaw opened this issue Sep 27, 2023 · 33 comments
Open

Formatter: Keep right-hanging comments aligned #7684

warsaw opened this issue Sep 27, 2023 · 33 comments
Labels
formatter Related to the formatter style How should formatted code look

Comments

@warsaw
Copy link

warsaw commented Sep 27, 2023

ruff format follows black's style for right-hanging comments (i.e. inline comments), but that leads to poorly readable results, especially when a series of consecutive lines uses comment alignment by column. Blue fixes this readability problem.

Let's start with this file:

def foo():
    x = 1                                         # this comment
    why = 2                                       # aligns with this comment
    zebra = 3                                     # and this one

As you can see, every # on each of these lines appears at the same column. How does ruff reformat the line?

def foo():
    x = 1  # this comment
    why = 2  # aligns with this comment
    zebra = 3  # and this one

This is the way black does it, i.e. by jamming all comment starts to two space characters between the last code character and the #. This destroys the alignment.

How does blue format the file? Nothing changes! Blue preserves the number of spaces between the last code character and the # with the assumption that the spacing is deliberate and done for readability.

ruff format should follow blue's rule here.

@MichaReiser MichaReiser added formatter Related to the formatter wish Not on the current roadmap; maybe in the future labels Sep 28, 2023
@MichaReiser MichaReiser modified the milestone: Formatter: Stable Sep 28, 2023
@warsaw
Copy link
Author

warsaw commented Oct 3, 2023

Now that the quote handling (#7310) has been fixed in 0.0.292, this is the last formatting incompatibility that I've been able to find between ruff and blue. Once this ticket is resolved I can officially start promoting ruff as a replacement for blue. Great work y'all and thank you!

@zanieb
Copy link
Member

zanieb commented Oct 3, 2023

I have also been annoyed with this when using Black. I'm not sure if there's a downside to supporting this. Ideally it would only apply if they start out aligned. Additionally, it'd be cool if it shrunk to the minimum compatible distance for alignment — but that's a little magical.

@charliermarsh
Copy link
Member

Enforcing this conditionally seems really hard... and enforcing this globally by default would be a big deviation from Black. I suspect it would need to be a global setting, if we choose to support it.

@JelleZijlstra
Copy link
Contributor

For what it's worth, I have consistently rejected this sort of feature in Black, as I don't think it leads to a consistent, stable formatting style. Obviously, what style Ruff chooses is up to you.

Barry's proposed rule is that the number of spaces before the hash gets preserved. But what if someone comes along and decides that zebras are worth more:

def foo():
    x = 1                                         # this comment
    why = 2                                       # aligns with this comment
    zebra = 31                                     # and this one

Now either the code gets misaligned (ugly), or they have to manually fiddle with the spaces, which defeats the point of an autoformatter.

Now, possibly the formatter could be made smart enough to detect that the comments are misaligned, and fix the spacing on the zebra line to get them aligned again.

But now what if we decide that zebras are worth a lot more:

def foo():
    x = 1                                         # this comment
    why = 2                                       # aligns with this comment
    zebra = 300000000000000000000000000000000000000  # and this one

Then, to get the lines aligned again, the formatter would have to change the x and why line, creating noise in the diff.

@warsaw
Copy link
Author

warsaw commented Oct 3, 2023

There's a little bit of missing context that might be helpful. In python-mode you can set a comment column, and then hit M-; (I'm not sure if that's a default keybinding, but it's set to comment-dwim for me) and Emacs will do it's best to ensure that hanging comment starts at that column. Of course in @JelleZijlstra 's second example, it will put a minimum number of spaces there but they won't align.

So @JelleZijlstra 's first example, it's relatively easy, albeit manual, to re-indent everything to an aligning column. Emacs only looks at the current, line so it doesn't actively try to keep things lined up, but the effect is usually that they are. The preservation of whitespace rule that blue implements works well with this approach.

It's never going to be perfect, but I'd argue that blue's rule is better than the step-wise always-2-spaces rule that black (and currently ruff) implements because that leads to difficult to read lines. It's worse with black though because you don't even have the option of manually aligning those hanging comments -- black will just crush them.

@zanieb
Copy link
Member

zanieb commented Oct 3, 2023

Hm... a comment-column setting is an interesting approach. Overall this sounds pretty difficult to implement and I agree that manually fiddling with the alignment would be a disappointing outcome. I appreciate all of the extra context!

@konstin
Copy link
Member

konstin commented Oct 4, 2023

I'd prefer not to support this, i think comments directly after the statements or alternatively leading own line comments (where they are column aligned) are fine enough to read. For the the case of not modifying whitespace i agree with @JelleZijlstra that would defeat the point of an autoformatter while for the case of column aligning i don't think the possible improvement warrants introducing a column layout when we don't use one elsewhere¹.

This is also bad from a point of locality, currently the formatting of one statement does not influence the formatting of other statements (we see those comments as trailing on the statement).


¹With column layout, i mean e.g.

class Foo:
    x     = 1                  # this comment
    why   = 22                 # aligns with this comment
    zebra = 333                # and this one

@warsaw
Copy link
Author

warsaw commented Oct 4, 2023

Hm... a comment-column setting is an interesting approach

Thinking about this overnight, this could be the best way forward. It would be a local-to-the-line only setting -- no need for global parsing or alignment. It also, um, aligns more with the Emacs behavior. ruff would do its best to preserve comment-column but in the "big zebra" case above, it would just push the hanging comment out to the right. That's fine because the user would still have a lot of flexibility in getting the format they want.

Straw man proposal:

  • comment-column = 60 - best effort to align the hanging comment # on column 60, the Emacs behavior
  • comment-column = +2 - a relative setting, i.e. the current black/ruff default
  • comment-column = false - preserve existing whitespace, i.e. the blue behavior

@effigies
Copy link

effigies commented Oct 8, 2023

The top two reasons I use blue and not black are single quotes and not messing with comment spacing. I don't mind bumping the spaces to 2, but removing additional spaces is a real pain. If I'm using more, there's pretty much always a reason.

Here's an example that even blue currently damages (blue fixed the single-line expression case). These are pairs of field names and numpy dtype codes, followed by comments:

 header_dtd = [
-    ('sizeof_hdr', 'i4'),      # 0; must be 348
-    ('data_type', 'S10'),      # 4; unused
-    ('db_name', 'S18'),        # 14; unused
-    ('extents', 'i4'),         # 32; unused
-    ('session_error', 'i2'),   # 36; unused
-    ('regular', 'S1'),         # 38; unused
-    ('dim_info', 'u1'),        # 39; MRI slice ordering code
-    ('dim', 'i2', (8,)),       # 40; data array dimensions
+    ('sizeof_hdr', 'i4'),  # 0; must be 348
+    ('data_type', 'S10'),  # 4; unused
+    ('db_name', 'S18'),  # 14; unused
+    ('extents', 'i4'),  # 32; unused
+    ('session_error', 'i2'),  # 36; unused
+    ('regular', 'S1'),  # 38; unused
+    ('dim_info', 'u1'),  # 39; MRI slice ordering code
+    ('dim', 'i2', (8,)),  # 40; data array dimensions
 ]

I think it's hard to argue that this change makes things more readable. I understand that preserving an aligned column is difficult, and I really don't care if an autoformatter chooses not to try. But if I go through the trouble of making an aligned column of comments, it's frustrating for an autoformatter to destroy it and give no option to just leave it alone. The PR I opened for blue about this (grantjenks/blue#83) still enforces at least two spaces, but otherwise leaves inline comment spacing alone.

@dhirschfeld
Copy link

Destroying carefully crafted comment spacing can make code harder to parse/understand which in turn can make it easier for bugs to creep in.

Some people don't use comment spacing to make their code easier to visually parse and for them keeping the option of enforcing 2-spaces is fine. For those who do use spacing it would be nice to have the option to at least turn off any messing with comment spacing at all. I'd much rather manually align my comments (or not) than have an auto-formatter make my code harder to understand and maintain.

The ability to easily parse the code and therefore more easily spot bugs is also at the core of Jarrod's request to not mess with numpy spacing (which I'm also interested in, for the same reasons).

I'd be fine with switches such as --no-align-comments, --no-align-numpy to globally turn off autoformatting for those use cases.

@MichaReiser
Copy link
Member

MichaReiser commented Oct 16, 2023

How frequent do you use these carefully crafted comments? IMO, disabling formatting might be the best option if they are rare.

Considering:

header_dtd = [
    ('sizeof_hdr', 'i4'),      # 0; must be 348
    ('data_type', 'S10'),      # 4; unused
    ('db_name', 'S18'),        # 14; unused
    ('extents', 'i4'),         # 32; unused
    ('session_error', 'i2'),   # 36; unused
    ('regular', 'S1'),         # 38; unused
    ('dim_info', 'u1'),        # 39; MRI slice ordering code
    ('dim', 'i2', (8,)),       # 40; data array dimensions
 ]

It's unclear how the formatter should format the comments once a line exceeds the configured line width

header_dtd = [
    ('sizeof_hdr', 'i4'),      # 0; must be 348
    ('data_type', 'S10'),      # 4; unused
    ('db_name', 'S18'),        # 14; unused
    ('extents', 'i4'),         # 32; unused
    ('session_error', 'i2'),   # 36; unused
    ('regular', 'S1'),         # 38; unused
    ('dim_info', 'u1'),        # 39; MRI slice ordering code
    (
		'a very long header that exceeds the line width so that it breaks', 'i2', (8,)
	),       # 40; data array dimensions
 ]

@dhirschfeld
Copy link

How frequent do you use these carefully crafted comments? IMO, disabling formatting might be the best option if they are rare.

For me, they're rare enough that disabling formatting on a case-by-case basis is ok. I guess there's a threshold where too many # nofmt comments gets distracting and a global config would be preferred.

@effigies
Copy link

Yes, I can disable formatting for these structures, but there are nice features of formatting, such as uniformizing indentations and quote style, and it would be nice not to have to throw out the baby with the bathwater.

To be clear, I do not want the formatter to calculate the correct columns. Bumping 0/1 spaces to 2 is a completely fine rule to apply, but otherwise I just want them left alone. If I had indicated to the formatter to leave comments alone, I would expect the tuple to get reformatted and the comment spacing kept the same. If I then decided to adjust the spacing before the comment, I would expect the formatter to leave the new spacing alone.

FWIW, if your situation occurred, I would probably reformat it to:

header_dtd = [
    ('sizeof_hdr', 'i4'),      # 0; must be 348
    ...,
    (
        'a very long header that exceeds the line width so that it breaks',
        'i2',
        (8,),
    ),                         # 40; data array dimensions
 ]

Just as a meta-comment, it really is odd to me that black et al have decided that using spacing in comments to improve readability needs to be done away with, like arguments over tabs vs spaces or where to put newlines in parameter lists. Comments are the one bit of code whose only purpose is communication with other humans (okay, modulo pragma: and fmt: and so on), so it strikes me as particularly funny that everybody seems to be saying "We can't come up with an algorithm to make these look good all the time, so let's prevent humans from doing it themselves."

@warsaw
Copy link
Author

warsaw commented Oct 16, 2023

it strikes me as particularly funny that everybody seems to be saying "We can't come up with an algorithm to make these look good all the time, so let's prevent humans from doing it themselves."

💯

What do you think about my previous suggestion? I think it covers all the use cases.

@effigies
Copy link

What do you think about my previous suggestion? I think it covers all the use cases.

Yes, that would work fine for me.

@MichaReiser
Copy link
Member

it strikes me as particularly funny that everybody seems to be saying "We can't come up with an algorithm to make these look good all the time, so let's prevent humans from doing it themselves."

We want to support the best possible comment formatting that matches users' intuitions. It's just that comment formatting is surprisingly hard.I would estimate that about 30% of the development effort of our formatter is related to handling comments as best as we can. Comments are hard because they can appear in any position, there's no formal definition of what a comment comments, the output must be stable (on the first try), the formatter must support all possible comment placements, and misplacing comments can easily lead to syntax errors.

What do you think about my #7684 (comment)? I think it covers all the use cases.

Assuming we align all comments at, e.g. column 60. What happens if the comment has a width of 29, exceeding the line width by 1? For example, let's pretend the comment below is aligned at column 60 and does exceed the line width.

a = [1, 2]           # comment that exceeds the line width when aligned to column 60

The comment would fit fine if it isn't aligned at column 60. What's your expected behavior?

@warsaw
Copy link
Author

warsaw commented Oct 17, 2023

The comment would fit fine if it isn't aligned at column 60. What's your expected behavior?

Speaking just for myself, I would expect that ruff format would leave the comment alone, but ruff check would complain. I actually think it's an anti-pattern to have long hanging comments, and would personally rewrite such comments as a separate block preceding the code. My personal preference is that hanging comments usually end up being pragmas or type checker hints, although I do occasionally have short, lined-up "blocks" like the examples above.

Aside: I wish there was a better way to signal intent to static checkers, rather than pragma/type comments.

@effigies
Copy link

I would expect that ruff format would leave the comment alone, but ruff check would complain.

Yes. There are plenty of ruff checks that are not auto-fixable, and that hasn't been an existential problem. If the problem is this difficult, the obvious solution is to work on something that is tractable.

Clearly some people like black's behavior, so I am not saying do not include it. Just let it be optional, please. Or you can go further and allow some other specific behaviors, as Barry suggests.

Anyway, I feel like I've taken up enough space in this thread. I like ruff, I like autoformatting. Thanks for all the effort, whatever the end result of this conversation is.

@MichaReiser MichaReiser added style How should formatted code look and removed wish Not on the current roadmap; maybe in the future labels Oct 27, 2023
@toppk
Copy link

toppk commented Nov 8, 2023

if we're still in the spit-balling ideas phase, I'd like to use a double hash ## for @warsaw comment-column = false behavior (in addition to, not as a replacement) . For me it's only a few spots where I use comment columns, and I'd prefer having something easy enough to remember, not a prama or conf setting. And I'm happy that ruff format is fixing the whitespace in 99% of cases.

fwiw, I came up with this as the most pleasant to my eyes workaround

class Foo(Enum):
    AAA = -2  #      > for the a's
    BBBBBBBB = 0  #  > NOT USED
    CCCC = 1  #      > only when you need sea

@warsaw
Copy link
Author

warsaw commented Nov 8, 2023

Not sure I would use "most pleasant" over "least horrible" but I get your point! 😆

I suppose you could do this too:

class Foo(Enum):
    AAA = -2  #      # for the a's
    BBBBBBBB = 0  #  # NOT USED
    CCCC = 1  #      # only when you need sea

which almost looks right!

@toppk
Copy link

toppk commented Nov 12, 2023

Not sure I would use "most pleasant" over "least horrible" but I get your point! 😆

which almost looks right!

that was too uncanny valley for me 🤪

@nuno-andre
Copy link

I would generalize this issue as vertical alignment (colons, assignments, and comments), including cases such as:

@dataclass
class Example:
    an_attr:      int
    another_attr: str | None
    whatever:     dict[str, str]


class Colors(IntEnum):
    BLACK  = 0  # a comment
    RED    = 1  # another comment
    YELLOW = 2

I use a VSCode extension for vertical alignment, Better Align, which has a very convenient configuration method.

So, the above formatting would be:

colon = [ -1, 1 ]
assignment = [ 1, 1 ]
comment = 2

@randolf-scholz
Copy link
Contributor

The real solution to this problem are elastic tabstops, which would make this problem disappear overnight by using a single variable length tab for separation. But as long as editors are stuck in the 60s by interpreting tabs always as 1 + col % tab_length spaces, there is no good solution.

@warsaw
Copy link
Author

warsaw commented Jan 11, 2024

Pretty interesting read, but I'm not sure how this helps this particular issue in practice, given there are probably more editors in use out there than lawyers (or more relevant perhaps - coders?) with opinions 🤣 .

@mcrumiller
Copy link

New to the party this one really bugs me. I have so many cases where I have hanging aligned comments:

coefficients = [
   a + b + c,     # c0 - used for X
   a * c,         # c1 - used for Y
   1 /2 + d,      # c2 - used for Z
]

Am I correct reading through the issue that there is no way to add some type of exclusion to prevent inline comments from being excluded from formatting?

@rotten
Copy link

rotten commented Aug 15, 2024

I'm curious if there are any new thoughts or progress on this? Is it likely to have a solution in the near future or far future? Debating ruff for a new project instead of blue. I'm undecided on whether to make the jump.

@mcrumiller
Copy link

I'm of the opinion that comments, including the whitespace leading up to them, should not be touched by the formatter. All of the workarounds supplied in this thread lead to really ugly comment code. This one issue is my current biggest gripe with auto formatters and I find myself smattering # fmt: skip everywhere to deal with this issue.

@warsaw
Copy link
Author

warsaw commented Aug 15, 2024

Debating ruff for a new project instead of blue. I'm undecided on whether to make the jump.

As one of the original maintainers of blue, I'll say that despite this issue still being unresolved, I've already made the jump despite the occasional # fmt toggle. Still, I'd love to get an ETA on this ticket.

@ncoghlan
Copy link

ncoghlan commented Oct 18, 2024

I find myself smattering # fmt: skip everywhere to deal with this issue.

Is there a place to put the # fmt: skip to deal with this in a dataclass or TypedDict body?

I'm having to resort to a # fmt: off/# fmt: on pair for those, and it doesn't look nice (especially for small classes with only a few fields)

(It's also somewhat annoying that # fmt: off (preserve the trailing comment alignment) doesn't work, since the additional trailing text prevents ruff from recognising the directive)

@MichaReiser
Copy link
Member

Would you mind sharing a concrete example. It would help me understand what you're running into

@b-destoop
Copy link

I actually do this as shown below. It is far from perfect, and if the length of any of the characters before the # changes, the comment has to be updated manually, but as long as I am working somewhere else in the file (and auto-formatting along the way), nothing is changed.

def foo():
x = 1 # this comment
why = 2 # aligns with this comment
zebra = 3 # and this one

@ncoghlan
Copy link

@MichaReiser If you were asking about my dataclass/TypeDict comment, https://github.com/lmstudio-ai/venvstacks/blob/7877f743fd3df4d0d97c81d2f637f64ab73f1c34/src/venvstacks/stacks.py#L177 is one of the simpler cases:

class EnvironmentLockMetadata(TypedDict):
    """Details of the last time this environment was locked."""

    # fmt: off
    locked_at: str          # ISO formatted date/time value
    requirements_hash: str  # Uses "algorithm:hexdigest" format
    lock_version: int       # Auto-incremented from previous lock metadata
    # fmt: on

Putting # fmt: skip in the header line after the trailing : didn't turn off the automatic formatting in the class body - an off/on pair (as shown) seems to be the only way to do that.

For the original feature request here, I wonder if a special trailing comment alignment directive could solve the ambiguity problem:

class EnvironmentLockMetadata(TypedDict):
    """Details of the last time this environment was locked."""

    # fmt:                  # align comments
    locked_at: str          # ISO formatted date/time value
    requirements_hash: str  # Uses "algorithm:hexdigest" format
    lock_version: int       # Auto-incremented from previous lock metadata

Until the next blank line, trailing comments would be aligned with the trailing comment on the # fmt: ... line.

The trailing comment alignment behaviour could then be specified as:

  • # align comments: comments are pushed right if necessary to get a two space gap after the longest line, but never pulled left (keeps diffs as small as possible without losing trailing comment alignment)
  • # align +2: comments are always aligned with a two space gap between the end of the longest line and its trailing comment (moves trailing comments as far left as possible without losing alignment, at the expense of sometimes getting noisier diffs)

@ncoghlan
Copy link

ncoghlan commented Jan 6, 2025

The topic of aligned assignments came up again on discuss.python.org.

Considering the "alignment header" idea in that light, I would revise my suggestion to be:

# fmt: [= align][# align [+N]]
  • If = align is present, assignments have their = aligned with the one in the formatting header (and the formatter complains if the assignment targets don't all fit)
  • If # align is present, trailing comments have their leading # aligned with the one in the formatting header (and the formatter complains if the non-comment parts of the lines don't all fit)
  • If # align +N is present, trailing comments, including the one in the formatting header, have their leading # aligned to N spaces after the end of the longest line

The column formatting would apply until the next blank line.

Edit: Alternatively, if there needs to be a directive name up front for the formatter to make sense of the header comment:

# fmt: align [= (...|+N)][# (...|+N)]
  • omitting both alignment fields is equivalent to specifying # fmt: align # +2
  • If = ... is present, assignments have their = aligned with the one in the formatting header (and the formatter complains if the assignment targets don't all fit). For multiple assignment targets in a line, the last one before the assigned value is the one that gets aligned.
  • If = +N is present, assignments, including the one in the formatting header, have their last = aligned to N spaces after the end of the longest assignment target. # fmt: align is NOT considered an assignment target for purposes of this calculation, so the formatting header may not fully align if all the assignment targets are shorter than 12 characters.
  • If # ... is present, trailing comments have their leading # aligned with the one in the formatting header (and the formatter complains if the non-comment parts of the lines don't all fit)
  • If # +N is present, trailing comments, including the one in the formatting header, have their leading # aligned to N spaces after the end of the longest line. # fmt: align, # fmt: align = ... and # fmt: align = +N are NOT considered lines for purposes of this calculation, so the formatting header may not fully align if all the other lines in the block are shorter than that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
formatter Related to the formatter style How should formatted code look
Projects
None yet
Development

No branches or pull requests