How to stringify a parsed Markdown AST as-is (preserving the original source) while changing only some nodes #1460
-
IntroI'd like to use remark (or the underlying In my experiments (see below for an example), I found that ExampleIn the following example (written in TypeScript for Deno), my input is a Markdown source text with several incongruities and inconsistencies (multiple strong and emphasis styles, a somewhat unusual format for thematic breaks etc.): Heading
=======
[Link 1](http://example.com)
- - -
- __Lorem__ **ipsum**
- _dolor_ *sit* [link 2](http://example.com) I want my code to selectively edit the // rewrite-markdown.ts
//
// This is a script written for Deno. You should be able to run it with:
//
// deno run rewrite-markdown.ts
import type * as Mdast from "npm:@types/mdast"
import * as From from "npm:mdast-util-from-markdown"
import * as To from "npm:mdast-util-to-markdown"
import * as V from "npm:unist-util-visit"
const input = `Heading
=======
[Link 1](http://example.com)
- - -
- __Lorem__ **ipsum**
- _dolor_ *sit* [link 2](http://example.com)
`
// Parse Markdown to AST
const tree: Mdast.Root = From.fromMarkdown(input)
// Walk AST and edit link nodes
V.visit(tree, "link", (node: Mdast.Link, _index, _parent): V.VisitorResult => {
if (node.url.startsWith("http://")) {
node.url = node.url.replace("http://", "https://")
}
return V.SKIP
})
// Write modified AST back into Markdown text
const output = To.toMarkdown(tree)
console.log(output) Actual vs. desired resultdeno run --check rewrite-markdown.ts This prints (actual result): # Heading
[Link 1](https://example.com)
***
* **Lorem** **ipsum**
* *dolor* *sit* [link 2](https://example.com) I would like it to print (desired result): Heading
=======
[Link 1](https://example.com)
- - -
- __Lorem__ **ipsum**
- _dolor_ *sit* [link 2](https://example.com) I.e. preserving all idiosyncracies in the original source except for the link nodes I edited; the only change compared to the input is "http" to "https". As you can see, the actual output is normalized Markdown, e.g. it uses a consistent style for bullets/strong/emphasis etc. Which is not what I want. Possible solutionsOne idea I had was to install custom // rewrite-markdown.ts
…
// Write modified AST back into Markdown text
- const output = To.toMarkdown(tree)
+ const toMarkdownOptions: To.Options = {
+ handlers: {
+ "thematicBreak": (node: Mdast.ThematicBreak, parent, state, info): string => {
+ // Slice the relevant segment directly out of the input text
+ return input.slice(node.position!.start.offset, node.position!.end.offset)
+ }
+ }
+ }
+ const output = To.toMarkdown(tree, toMarkdownOptions)
console.log(output) Now, the original format of the thematic break is preserved in the rendered output. Great! This approach seems relatively trivial (if a little tedious to write) for leaf nodes of the tree. But I don't know how to achieve the same for nodes that can have children. E.g. in my example, the I feel this approach could potentially work, but I don't understand the mdast APIs well enough to make it work. Any ideas? Or are there other solutions you can think of? Thanks for reading, and thank you to everybody who works on these packages. I love the unified/remark/micromark ecosystem! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
Duplicate of https://github.com/orgs/remarkjs/discussions/1067, https://github.com/orgs/remarkjs/discussions/1194, and micromark/micromark#59
AST is short for Abstract Syntax Tree, the Abstract part focuses on simplifying working on structure, intentionally glossing over or normalizing stylistic parts of the language, like spaces or which list marker is used which don't change how the document will be structured/displayed (https://en.wikipedia.org/wiki/Abstract_syntax_tree) What you are describing is a Concrete Syntax Tree (https://en.wikipedia.org/wiki/Parse_tree), which could be built on top of micromark (syntax-tree/mdast#36 (comment)), but would have a completely different structure and way of working from remark or rehype. |
Beta Was this translation helpful? Give feedback.
-
You could do things with the positional info. It contains positional info about where in the original file things happened. You can then change only things at particular positions, yourself. |
Beta Was this translation helpful? Give feedback.
Duplicate of https://github.com/orgs/remarkjs/discussions/1067, https://github.com/orgs/remarkjs/discussions/1194, and micromark/micromark#59
remark
is an AST libraryAST is short for Abstract Syntax Tree, the Abstract part focuses on simplifying working on structure, intentionally glossing over or normalizing stylistic parts of the language, like spaces or which list marker is used which don't change how the document will be structured/displayed (https://en.wikipedia.org/wiki/Abstract_syntax_tree)
What you are describing is a Concrete Syntax Tree (https://en.wikipedia.org/wiki/Parse_tree), which could be built on top of micromark (syntax-tree/mdast#36 (comment)), but would have a complet…