Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialize mdast to markdown #127

Merged
merged 74 commits into from
Oct 7, 2024
Merged

Serialize mdast to markdown #127

merged 74 commits into from
Oct 7, 2024

Conversation

bnchi
Copy link
Contributor

@bnchi bnchi commented Aug 27, 2024

fixes #64

TODO handlers :

  • paragraph
  • text
  • strong
  • emphasis
  • heading
  • html
  • break
  • thematic-break
  • code
  • blockquote
    • Complete the tests for blockquote
  • image
  • inline-code
  • link
  • list
  • list-item
    • Add the tests for list item
  • root
  • definition
  • image-reference
  • link-reference

@bnchi bnchi changed the title Serializing mdast to markdown Serialize mdast to markdown Aug 27, 2024
@bnchi
Copy link
Contributor Author

bnchi commented Aug 31, 2024

While working on this, I noticed that the JS implementation relies heavily on regular expressions in some places. I believe we can rewrite some parts of the code in Rust instead of depending on the regex crate. Although the regex crate guarantees linear time complexity, we can eliminate the need for the regex engine in places where simpler Rust code would work. Maybe we can benchmark first but this won't happen until I move over most of the tests into Rust.

what do you think @wooorm ?

One more question what do you think make more sense here for the maintainability of this utility :

  1. To create a new workspace for mdast to markdown ?
  2. To create a separate module to group most of the related code for this util ?

@wooorm
Copy link
Owner

wooorm commented Sep 2, 2024

While the core of markdown-rs is no-std + alloc, that’s for the use case where you only really do markdown -> html, straight.
Here we are working with ASTs. I think it’s fine to depend on regexes there.
And, both JS and Rust will need to be maintained and kept in sync. So, fine to improve things if it’s fast, just, that has to be backported.
I’d recommend going with regexes first. To focus on porting the code. Instead of changing the code. That becomes confusing. Hard to manage.

As for workspace vs module, I have to particular preference currently. You?

@bnchi
Copy link
Contributor Author

bnchi commented Sep 3, 2024

Since we're working with an AST and this is a utility a workspace sounds better to me this will help to isolate the tests when we quickly run cargo test and version this util by it's own too.

It's not that hard to change that decision later on, I updated the PR to use a workspace instead and kept the no_std constrain.

@bnchi bnchi requested a review from wooorm October 5, 2024 07:10
@bnchi
Copy link
Contributor Author

bnchi commented Oct 5, 2024

I completed copying over the tests so now all of the tests are a match, I also did a quick review everything looks fine to me, defiantly a few performance improvements and refactors can be done in a separate PR but now since the tests are moved over it's easy to make changes without worrying about something breaking.

I've also added comments as proposed for some data structures.

@augustkline
Copy link

augustkline commented Oct 6, 2024

Just a heads-up, serializing a yaml node to markdown gives an error. I don't know if that was intentional, but if not you might be missing a yaml handler

@augustkline
Copy link

Thank you so much for this PR btw! Really needed this feature

@bnchi
Copy link
Contributor Author

bnchi commented Oct 6, 2024

wooorm knows more about this than me, but I think Yaml and Toml are not part of the CommonMark specs they're part of the frontmatter extension this PR ports over the JS implementation here mdast-util-to-markdown

In order for Yaml and Toml to work we need to add support for frontmatter serialization I believe the JS version is here mdast-util-frontmatter

@augustkline
Copy link

Ah gotcha! Thanks for the info & links!

@ChristianMurphy
Copy link
Collaborator

Some extensions have built in parsers that can be turned on with an option.
Including frontmatter! https://github.com/wooorm/markdown-rs?tab=readme-ov-file#extensions

It could make sense to support serialization for those tools.
But this PR is pretty sizable already.
I'd lean towards having extensions in follow up PRs.

@ChristianMurphy
Copy link
Collaborator

Also additional testing, using a fuzzer to to validate that content can round trip.

In other words markdown can be parsed to a tree, be serialized again, and parsed again should have the same tree structure (though positions and exact characters may change).

That technique found a lot of quirks in the original micromark/remark parser and serialized.
And could be valuable to repeat here.
But is likely outside the scope of this PR.

References:
About fuzzing: https://rust-fuzz.github.io/book/
Current parser fuzz test: https://github.com/wooorm/markdown-rs/tree/main/fuzz

@wooorm
Copy link
Owner

wooorm commented Oct 7, 2024

But this PR is pretty sizable already.
I'd lean towards having extensions in follow up PRs.

Right yes, that indeed. Somewhere above @bnchi and I talked about this too and concluded the same.

I believe @bnchi has used up a lot of the time they have to work on this. So if @augustkline or others are up for follow up PRs, I’d love that!

@wooorm wooorm merged commit 567e7e0 into wooorm:main Oct 7, 2024
3 checks passed
@wooorm

This comment was marked as resolved.

@wooorm

This comment was marked as resolved.

@wooorm
Copy link
Owner

wooorm commented Oct 7, 2024

Only thing perhaps is some docs in this readme here?
And, how should this new code be used? Should it be published? As alpha?

@Enoumy
Copy link

Enoumy commented Oct 8, 2024

Thank you so much for writing this PR!

@bnchi
Copy link
Contributor Author

bnchi commented Oct 8, 2024

Only thing perhaps is some docs in this readme here?

Yeah I can add instructions on how to use this code in a readme

Should it be published? As alpha?

Alpha sounds alright, what do you think ?

@wooorm
Copy link
Owner

wooorm commented Oct 10, 2024

Sounds good to me! 👍

@766F6964
Copy link

@bnchi I am currently playing around with this, since I need a pipeline:
_markdown data _ -> markdown ast -> transformations on markdown ast -> back to markdown

I got this setup working, however, something that appears to be completely missing are tables?

@bnchi
Copy link
Contributor Author

bnchi commented Dec 27, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Serializing mdast to markdown
6 participants