Skip to content

Multiline summary is not split correctly #315

Open
@FlorianGD

Description

@FlorianGD

I saw the error in CI because I used master (that was needed for a while to use with pre-commit version 4 and above).

Some multi lines sentences are split incorrectly. This is the behavior of the function split_summary

>>> split_summary(["First sentence is here. Second sentence is long", "and split in two. I even have a third sentence here.", "", "And other text here."])
['First sentence is here.',
 'and split in two. I even have a third sentence here.',
 'Second sentence is long',
 '',
 'And other text here.']

I think this comes from the split_summary function that does this:

    lines[0] = first_sentence
    if rest_text:
        lines.insert(2, rest_text)

and inserts at the wrong place if we have sentences that are too long.

I am not familiar with the code base, but maybe something along those lines could work?

def split_summary(lines) -> List[str]:
    """Split multi-sentence summary into the first sentence and the rest."""
    if not lines or not lines[0].strip():
        return lines

    text = lines[0].strip()

    tokens = re.split(r"(\s+)", text)  # Keep whitespace for accurate rejoining
    sentence = []
    rest = []
    i = 0

    while i < len(tokens):
        token = tokens[i]
        sentence.append(token)

        if token.endswith(".") and not any(
            "".join(sentence).strip().endswith(abbr) for abbr in ABBREVIATIONS
        ):
            i += 1
            break

        i += 1

    rest = tokens[i:]
    first_sentence = "".join(sentence).strip()
    rest_text = "".join(rest).strip()

    new_lines = [first_sentence, ""]
    if rest_text:
        new_lines.append(rest_text)
        new_lines.extend(line for line in  lines[1:] if line)

    return new_lines

This gives:

>>> split_summary(["First sentence is here. Second sentence is long", "and split in two. I even have a third sentence here.", "", "And other text here."])
['First sentence is here.',
 '',
 'Second sentence is long',
 'and split in two. I even have a third sentence here.',
 'And other text here.']

I do not know if the result should be processed more before returning or if it is something that is taken into account elsewhere in the codebase.

Metadata

Metadata

Assignees

No one assigned

    Labels

    C: conventionRelates to docstring format conventionP: bugPEP 257 violation or existing functionality that doesn't work as documentedU: high

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions