Fix generating partially valid tokens #3

mattiasarro · 2023-05-30T19:24:35Z

Matching a regex partially can lead to generating a token which causes the whole generated sequence to be invalid, even if a substring of the token would result in a valid output.

The other option would be to tweak complete_re we run the if stop_after_match: block after every character of the token (rather than the full token text) to the output text, but that's less clean. Or is that needed to be able to generate some output sequences which can only occur by generating a larger invalid token and then pruning the output?

Edit: looks like we need the latter approach, see latest commit.

Matching a regex partially can lead to generating a token which causes the whole generated sequence to be invalid.

When using partial=True, we ensure we don't generate invalid output, but also this makes it impossible to generate certain output sequences. Therefore it's necessary to allow generating tokens which match only partially, and then take the substring of that token which matches the regex.

freckletonj · 2023-09-05T00:25:25Z

I'm interested in this solution too, as I was having the same parserllm issues as in: r2d4/parserllm#4

Also, the outlines project may interest you. They precompile valid continuations, and then inference happens in O(c).

The issue I have with outlines though is abominable lark support; their example is slooow: https://github.com/normal-computing/outlines/blob/main/examples/parsing.py

ReTokenFilter.is_valid_token: partial=False

4c6b8cb

Matching a regex partially can lead to generating a token which causes the whole generated sequence to be invalid.

mattiasarro mentioned this pull request May 30, 2023

LLM generates text that the grammar does not allow r2d4/parserllm#4

Closed

mattiasarro changed the title ~~ReTokenFilter.is_valid_token: partial=False~~ Fix generating partially valid tokens May 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix generating partially valid tokens #3

Fix generating partially valid tokens #3

Uh oh!

mattiasarro commented May 30, 2023 •

edited

Loading

Uh oh!

freckletonj commented Sep 5, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix generating partially valid tokens #3

Are you sure you want to change the base?

Fix generating partially valid tokens #3

Uh oh!

Conversation

mattiasarro commented May 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

freckletonj commented Sep 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mattiasarro commented May 30, 2023 •

edited

Loading

freckletonj commented Sep 5, 2023 •

edited

Loading