Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

micromark: render shouldn't freeze the page if jump source and destination are equal #186

Closed
4 tasks done
OEvgeny opened this issue Nov 12, 2024 · 9 comments
Closed
4 tasks done
Labels
🙋 no/question This does not need any changes 👎 phase/no Post cannot or will not be acted on

Comments

@OEvgeny
Copy link

OEvgeny commented Nov 12, 2024

Initial checklist

Affected packages and versions

[email protected] and its dep [email protected]

Link to runnable example

https://bolt.new/~/sb1-7ektru or https://stackblitz.com/edit/sb1-7ektru

Steps to reproduce

  1. Open the link provided and wait until Preview appears
  2. Click Clear button
  3. Right click on the frame page, and select Inpsect (so you could stop execution)
  4. Type the following
\[
\
\]
  1. Notice the page freezes.
  2. Go to Devtools sources pane and hit pause button.
  3. Inspect the jumps object
    Screenshot from 2024-11-12 13-46-19

Environment details provided directly in the repro. Please use built-in terminal to verify versions.

❯ npm -v
10.2.3
❯ node -v
v18.20.3

Expected behavior

While the extension could have mistakes in token placement, I think it's still expected that parser won't hang during such error. EIther such jumps should be omitted or bail.

Actual behavior

Page hangs in an infinite while loop (line 2698 on the screenshot).

Runtime

Other (please specify in steps to reproduce)

Package manager

Other (please specify in steps to reproduce)

OS

Other (please specify in steps to reproduce)

Build and bundle tools

Vite

@github-actions github-actions bot added 👋 phase/new Post is being triaged automatically 🤞 phase/open Post is being triaged manually and removed 👋 phase/new Post is being triaged automatically labels Nov 12, 2024
@ChristianMurphy
Copy link
Member

Welcome @OEvgeny! 👋
Sorry you ran into a spot of trouble.
It looks like you are trying to overload [] to mean either a reference link to to mean math?
I'd recommend not doing that, it can cause crashes due to conflicts in the tokenizer as you are running into.
That isn't a micromark bug, but a but in the custom tokenizer.


While not stated here, that looks a lot like ChatGPT generated math.
While not finalized, I'd highly recommend reading unifiedjs/unifiedjs.github.io#46

@ChristianMurphy
Copy link
Member

Adding a line like

- **LaTeX Formatting for Clarity**: When presenting math, use LaTeX with "`$...$`" for inline math (e.g., `$x^2 + y^2 = z^2$`) and "`$$...$$`" for display math (e.g., `$$\int x^2 \, dx$$`).

to the LLM prompt resolves the issue without needing a customer tokenizer.

@ChristianMurphy
Copy link
Member

sigh I see this is related to microsoft/BotFramework-WebChat#5353
Please, use the markdown flavor your renderer uses instead of creating a custom one, it doesn't "improve rendering and error handling".
It makes yet another custom markdown flavor https://github.com/micromark/micromark?tab=readme-ov-file#extending-markdown

@ChristianMurphy ChristianMurphy added the 🙋 no/question This does not need any changes label Nov 12, 2024
Copy link

Hi! Thanks for reaching out! Because we treat issues as our backlog, we close issues that are questions since they don’t represent a task to be completed.

See our support docs for how and where to ask questions.

Thanks,
— bb

@github-actions github-actions bot added 👎 phase/no Post cannot or will not be acted on and removed 🤞 phase/open Post is being triaged manually labels Nov 12, 2024
@OEvgeny
Copy link
Author

OEvgeny commented Nov 12, 2024

It looks like you are trying to overload [] to mean either a reference link to to mean math?

Correct, we need \[ and \] to be the open and closing tags for math blocks. But it isn't related to the issue really.

That isn't a micromark bug, but a but in the custom tokenizer.

I saw some assertions for tokens are already in place. I don't see why such simple assertion is not welcomed, even if it's not met in micromarks code, rather a result of buggy tokenizer extension.

sigh I see this is related to microsoft/BotFramework-WebChat#5353
(...)
it doesn't "improve rendering and error handling".

It doesn't, another part of the PR does though.

sigh I see this is related to microsoft/BotFramework-WebChat#5353

Unfortunately we're forced to use the different flavor of markdown. I'm not happy about it as well.

@ChristianMurphy
Copy link
Member

I saw some assertions for tokens are already in place. I don't see why such simple assertion is not welcomed, even if it's not met in micromarks code, rather a result of buggy tokenizer extension.

I'd defer some to @wooorm on this.
Most assertions are used in develop mode.
More of those are probably fine, but changing the built/production output would have performance implications and would take more consideration.

@wooorm
Copy link
Member

wooorm commented Nov 13, 2024

Assertions are indeed only in development mode. They are compiled away. Are you running in development mode? If you are running in production mode, it is very likely that you are missing all the assertions.

another part of the PR does though.

Which PR?


Especially backslashes, and especially square brackets, are already highly used in markdown. Better to use something else.


Markdown is not a good grammar. How it all hangs together is complex and everything interacts with each other. It is a goal to allow extensions, but: not everything is possible.

@OEvgeny
Copy link
Author

OEvgeny commented Nov 13, 2024

Assertions are indeed only in development mode. They are compiled away. Are you running in development mode? If you are running in production mode, it is very likely that you are missing all the assertions.

Not sure which build I'm using, but from the screenshot above:

  1. Inspect the jumps object
    Screenshot from 2024-11-12 13-46-19

On the line 2702 you can spot ok(...) call to check tokenizer validity. I'm pretty sure that something should be done, so jumps don't contain entries where source and destination is the same index (5 => 5) in the example.

Could add the check into the function calculating jumps array. Or at least don't emit such entries.

Which PR?

Updated the original quote to include the context.

Especially backslashes, and especially square brackets, are already highly used in markdown. Better to use something else.

We'll keep this in mind, thank you.

Markdown is not a good grammar. How it all hangs together is complex and everything interacts with each other. It is a goal to allow extensions, but: not everything is possible.

I feel the same, that's why I opened the issue, this seems like an easy improvement to make both DX and UX better.

@ChristianMurphy
Copy link
Member

Not sure which build I'm using, but from the screenshot above:

If you're not sure, probably the production build.
If you are looking for more detailed feedback use the develop build.
There's a guide on how to here: https://github.com/micromark/micromark?tab=readme-ov-file#size--debug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🙋 no/question This does not need any changes 👎 phase/no Post cannot or will not be acted on
Development

No branches or pull requests

3 participants