Skip to content

Fix: Tests broken by Python 3.13.6+ html.parser changes - Exclude title elements from text wrapping#116

Open
natsukium wants to merge 1 commit intoalan-turing-institute:mainfrom
natsukium:fix/python3.13.6
Open

Fix: Tests broken by Python 3.13.6+ html.parser changes - Exclude title elements from text wrapping#116
natsukium wants to merge 1 commit intoalan-turing-institute:mainfrom
natsukium:fix/python3.13.6

Conversation

@natsukium
Copy link
Copy Markdown

Thank you for this great project! I tried to use ReadabiliPy with Python 3.13.7 and found that tests were failing, so I investigated the cause and implemented a fix.

Context

Python 3.13.6+ introduced stricter HTML5 compliance in html.parser which causes our tests to fail.

Problem

  • The wrap_bare_text function wraps ALL bare text in <p> tags, including text inside <title> elements
  • This creates invalid HTML: <title><p>Text</p></title>
  • Python 3.13.6+ with stricter HTML parsing serializes this as <title>&lt;p&gt;Text&lt;/p&gt;</title> (escaping the invalid nested tags)
  • Tests fail because expected output has <title><p>Text</p></title> but actual has escaped version

Solution

Related Python issues

…ext wrapping

Prevents wrap_bare_text from adding <p> tags inside <title> elements,
which caused HTML entity escaping issues in Python 3.13.6+ due to
changes in the html.parser module.
Title elements can only contain text content per HTML spec.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant