Fix markdown parser regex to support hyphenated language identifiers

**User Story**  
As a developer using markdown code blocks,  
I want to use hyphenated language identifiers (e.g., `python-3`) in code fences  
so that valid syntax isn’t rejected by the parser.

**Background**  
The current regex pattern in `mdextractor/__init__.py` (`r"```(?:\w+\s+)?(.*?)```"`) fails to recognize language specifiers containing hyphens (e.g., `python-3`). This violates CommonMark’s allowance for hyphenated identifiers and breaks compatibility with tools/linters expecting such syntax. The `\w+` group in the regex excludes hyphens, causing valid code blocks to be misparsed or ignored.  

**Acceptance Criteria**  
- [ ] Update the regex in `extract_md_blocks` (file: `mdextractor/__init__.py`) to `r"```([\w-]+\s+)?(.*?)```"`.  
- [ ] Add test cases to `tests/test_mdextractor.py` verifying hyphenated identifiers:  
  - Test `python-3` as a language specifier.  
  - Test mixed alphanumeric-hyphen combinations (e.g., `rust-2021-edition`).  
- [ ] Ensure existing tests (e.g., `test_with_language_specifier`, `test_single_line`) pass with the updated regex.  
- [ ] Validate that code blocks without hyphens (e.g., `python`) remain unaffected.  
- [ ] Confirm nested backticks and malformed fences (e.g., `test_malformed_fences`) are still handled correctly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix markdown parser regex to support hyphenated language identifiers #7

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Fix markdown parser regex to support hyphenated language identifiers #7

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions