HTMLCrunch

A clean, simple, lightweight HTML parser.

Features

follows the spec closely
parse elements, fragments and complete html documents
transform the parse tree and use isCommentNode, isTextNode isElementNode or the generic isMNode guards to branch on the different cases
serialize nodes and fragments back to strings and optionally remove comments
the parser supports HTML end tag omissions

Getting Started

Depending on your package manager:

deno add jsr:@fcrozatier/htmlcrunch
pnpm i jsr:@fcrozatier/htmlcrunch
npx jsr add @fcrozatier/htmlcrunch
yarn add jsr:@fcrozatier/htmlcrunch
bunx jsr add @fcrozatier/htmlcrunch

Simple Example

import { fragments, serializeFragments } from "@fcrozatier/htmlcrunch";
import { assertEquals } from "@std/assert";

// A string of html or an html file
const content = `<div>html string...</div>`;

// Parse it with the `element`, `fragments` or `html` parsers
const parsed = fragments.parseOrThrow(content);

// Walk the parse tree, analyse and modify it ...

// Serialize the result with `serializeNode` or `serializeFragments`
const serialized = serializeFragments(parsed);

assertEquals(content, serialized);

Spec

HtmlCrunch implements the following parts of the HTML spec:

spec	status
Structure
- document structure	✅
- modern doctype	✅
Elements
- self-closing void elements	✅
- raw text elements	✅
- foreign elements (MathML & SVG namespaces)	✅
- normal elements	✅
Attributes
- Empty attribute syntax	✅
- Unquoted attribute value syntax	✅
- Single-quoted attribute value syntax	✅
- Double-quoted attribute value syntax	✅
Optional tags
- end tag omission	✅
- start tag omission	🚫 (not planned)
content model validation and restriction	⚠️ (not supported)
text	✅
CDATA sections	✅
comments	✅

End Tag Omission

In HTML, the end tags of <li>, <dt>, <dd>, <p> and <option> elements, as well as the end tags of <table> children elements can be omitted for a lighter authoring experience

import { element, serializeNode } from "@fcrozatier/htmlcrunch";

// Omit `<li>` end tags
element.parseOrThrow(
  `<ul>
    <li>Apples
    <li>Bananas
  </ul>`,
);

// Omit `<dt>` and `<dd>` end tags
element.parseOrThrow(
  `<dl>
    <dt>Coffee
    <dd>Black hot drink
    <dt>Milk
    <dd>White cold drink
  </dl>`,
);

// Omit `<p>` end tags
element.parseOrThrow(
  `<body>
    <p>This is the first paragraph.
    <p>This is the second paragraph, and it ends when the next div begins.
    <div>A block element</div>
  </body>`,
);

// Omit `<option>` end tags
element.parseOrThrow(
  `<select>
    <option value="1">One
    <option value="2">Two
    <option value="3">Three
  </select>`,
);

// Omit end tags inside a `<table>`
const table = element.parseOrThrow(
  `<table>
  <caption>37547 TEE Electric Powered Rail Car Train Functions (Abbreviated)
  <colgroup><col><col><col>
  <thead>
   <tr> <th>Function                              <th>Control Unit     <th>Central Station
  <tbody>
   <tr> <td>Headlights                            <td>✔                <td>✔
   <tr> <td>Interior Lights                       <td>✔                <td>✔
   <tr> <td>Electric locomotive operating sounds  <td>✔                <td>✔
   <tr> <td>Engineer's cab lighting               <td>                 <td>✔
   <tr> <td>Station Announcements - Swiss         <td>                 <td>✔
  </table>`,
);

API

The interactive documentation is available on JSR.

The elements, fragments, html and shadowRoot parsers are Monarch Parsers and can thus be composed and extended with other Monarch parsers.

Their main methods are parse and parseOrThrow. See Monarch documentation for the other available methods.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github		.github
.vscode		.vscode
assets		assets
tests		tests
.gitignore		.gitignore
README.md		README.md
deno.jsonc		deno.jsonc
mod.ts		mod.ts
parser.ts		parser.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HTMLCrunch

Features

Getting Started

Simple Example

Spec

End Tag Omission

API

About

Uh oh!

Releases 3

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HTMLCrunch

Features

Getting Started

Simple Example

Spec

End Tag Omission

API

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 3

Contributors

Uh oh!

Languages