Add HTML parser: extract inline <script> functions + id-anchored DocSection nodes

## Motivation

Hand-authored single-page HTML documents — design briefs, HLDs, READMEs rendered to HTML, internal architecture pages — frequently mix substantial prose content with a small inline `<script>` block driving navigation/search/scroll-tracking, plus a `<style>` block for layout.

Today CRG sees these as opaque files (a single `File` node, nothing else). The inline JS functions are invisible to `semantic_search`, and there's no way to ask "where is section §17 in this 6 KLOC doc" without falling back to grep.

This proposal adds first-class HTML support that's symmetric to the existing Vue/Svelte SFC handling.

## Proposal

Two extraction concerns, both already proven by `_parse_vue` / `_parse_svelte`:

### 1. Inline `<script>` blocks → delegate to JS parser
Identical pattern to `_parse_vue`: walk the HTML tree, find `script_element` nodes, parse their `raw_text` with the JavaScript grammar, propagate the resulting `Function` / `Class` / `CALLS` nodes/edges with `line_start` / `line_end` offset to position within the .html file.

### 2. Id-anchored elements → new `DocSection` node kind
For each `<div id="…">` / `<section id="…">` / `<article id="…">`, emit a `DocSection` node with the element's line range and `html_id` in `extra`. This is the navigation primitive a sidebar typically links to — making it searchable lets queries like *"where does the cover section start"* or *"which page has the search-tabs anchor"* resolve in one `semantic_search` call.

### Deliberately out of scope
- `<style>` block extraction. No CSS grammar plumbed in CRG yet; a line-range-only stub would have negligible value. Easy follow-up if/when a CSS extractor lands.
- Heading-anchor sections (`<h2 id="…">`). Common in GitHub-rendered READMEs but pollutes the index when nested inside `<div id="…">` doc sections. Could be added with an opt-in if there's demand.

## Reference implementation

Branch: [`feat/html-parser-support`](https://github.com/dannymn/code-review-graph/tree/feat/html-parser-support) on my fork.

3-file diff, +298 LOC:
- `code_review_graph/parser.py`: 2 lines added to `EXTENSION_TO_LANGUAGE`, 3-line dispatch in `parse_bytes`, ~140-line `_parse_html` method modeled on `_parse_vue`.
- `tests/test_parser.py`: 7 new tests in the existing `TestCodeParser` class.
- `tests/fixtures/sample_html.html`: one fixture exercising every code path.

Tests (all pass; full `test_parser.py` 110/110):
- `test_detect_language_html` (.html and .htm)
- `test_parse_html_file` — File node + Function + Class extracted from `<script>`
- `test_parse_html_doc_sections` — `#cover`, `#s1`, `#s2`, `#appendix` emitted
- `test_parse_html_line_numbers_offset` — script line numbers reflect .html position
- `test_parse_html_nodes_have_html_language` — every node tagged `language="html"`
- `test_parse_html_no_script` — DocSection emitted even without any script
- `test_parse_html_only_id_anchored_sections` — plain `<div>` without id excluded

Field-validated locally: indexed a 6 KLOC HTML (an HLD) and a 3 KLOC reference HTML (a design brief), found 60+ JS functions with accurate line ranges, 39 DocSections, all surfaced by `semantic_search` queries.

## Questions for the maintainer before opening a PR

1. **Node-kind naming.** `DocSection` is intentionally generic — could later cover Markdown headings or other doc formats. Would you prefer `HTMLSection` to keep it format-scoped? Or skip the new kind entirely and reuse `Class` / `File`?
2. **Tag whitelist for sections.** Currently `div` / `section` / `article` with `id`. Worth widening to any element with `id`? Or narrowing to just `section` / `article`?
3. **`<style>` handling.** Drop entirely (current proposal), emit a line-range-only `Stylesheet` stub, or wait until a CSS extractor lands?
4. **Heading-anchor sections** (`<h2 id="x">`). Opt-in flag, always-on, or out of scope?

Happy to adjust the patch to match your preferences before submitting a PR. Local CRG using this patch has been running for ~1 day with zero regressions across 869 indexed files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add HTML parser: extract inline <script> functions + id-anchored DocSection nodes #521

Motivation

Proposal

1. Inline `<script>` blocks → delegate to JS parser

2. Id-anchored elements → new `DocSection` node kind

Deliberately out of scope

Reference implementation

Questions for the maintainer before opening a PR

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add HTML parser: extract inline <script> functions + id-anchored DocSection nodes #521

Description

Motivation

Proposal

1. Inline <script> blocks → delegate to JS parser

2. Id-anchored elements → new DocSection node kind

Deliberately out of scope

Reference implementation

Questions for the maintainer before opening a PR

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

1. Inline `<script>` blocks → delegate to JS parser

2. Id-anchored elements → new `DocSection` node kind