Open
Description
I saw the error in CI because I used master
(that was needed for a while to use with pre-commit
version 4 and above).
Some multi lines sentences are split incorrectly. This is the behavior of the function split_summary
>>> split_summary(["First sentence is here. Second sentence is long", "and split in two. I even have a third sentence here.", "", "And other text here."])
['First sentence is here.',
'and split in two. I even have a third sentence here.',
'Second sentence is long',
'',
'And other text here.']
I think this comes from the split_summary
function that does this:
lines[0] = first_sentence
if rest_text:
lines.insert(2, rest_text)
and inserts at the wrong place if we have sentences that are too long.
I am not familiar with the code base, but maybe something along those lines could work?
def split_summary(lines) -> List[str]:
"""Split multi-sentence summary into the first sentence and the rest."""
if not lines or not lines[0].strip():
return lines
text = lines[0].strip()
tokens = re.split(r"(\s+)", text) # Keep whitespace for accurate rejoining
sentence = []
rest = []
i = 0
while i < len(tokens):
token = tokens[i]
sentence.append(token)
if token.endswith(".") and not any(
"".join(sentence).strip().endswith(abbr) for abbr in ABBREVIATIONS
):
i += 1
break
i += 1
rest = tokens[i:]
first_sentence = "".join(sentence).strip()
rest_text = "".join(rest).strip()
new_lines = [first_sentence, ""]
if rest_text:
new_lines.append(rest_text)
new_lines.extend(line for line in lines[1:] if line)
return new_lines
This gives:
>>> split_summary(["First sentence is here. Second sentence is long", "and split in two. I even have a third sentence here.", "", "And other text here."])
['First sentence is here.',
'',
'Second sentence is long',
'and split in two. I even have a third sentence here.',
'And other text here.']
I do not know if the result should be processed more before returning or if it is something that is taken into account elsewhere in the codebase.