-
-
Notifications
You must be signed in to change notification settings - Fork 10
Open
Description
User Story
As a maintainer,
I want robust test coverage for code blocks containing escaped backticks
so that the Markdown parser reliably handles edge cases without false positives.
Background
The current regex pattern in mdextractor/__init__.py uses r"```(?:\w+\s+)?(.*?)```" with re.DOTALL, which may prematurely close code blocks containing legitimate backticks (e.g., echo "```"). This creates maintenance risks:
- The
test_nested_code_blocksunit test demonstrates incorrect parsing of inner backticks - Real-world code snippets with escaped backticks could be truncated
- Language specifier detection might interfere with content extraction
Acceptance Criteria
- Add test case to
tests/test_mdextractor.pyverifying:def test_backticks_inside_code_content(self): text = '''```sh echo "```" ```''' self.assertEqual(extract_md_blocks(text), ['echo "```"'])
- Update regex pattern to handle backticks within code content
- Ensure existing tests pass after modification
- Verify extraction works for:
- Code blocks containing
\``,``, and```` sequences - Mixed-language examples with internal backticks
- Consecutive valid blocks separated by text
- Code blocks containing
- Document pattern limitations in function docstring if any edge cases remain