Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex matching in a rope #23

Open
muxcmux opened this issue Oct 28, 2024 · 4 comments
Open

Regex matching in a rope #23

muxcmux opened this issue Oct 28, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@muxcmux
Copy link

muxcmux commented Oct 28, 2024

Hi, this isn't an issue but more of an ask for some pointers on how to make finding regex matches in a crop::Rope work.

I checked out the regex crate but its api seems to be mainly designed around working with &strs . They recommended to use the lower level regex-automata when:

You want to use the lower level search APIs. For example, both the lazy DFA and fully compiled
DFAs support searching by exploring the automaton one state at a time. This might be useful, for
example, for stream searches or searches of strings stored in non-contiguous in memory.

This is what Helix editor is doing for regex search/replace. They have created a regex-cursor lib and in it they also provide an implementation of their Cursor trait for use with ropey. From what I can tell, this is used to create a regex Input with their RopeyCursor (which impls said Cursor trait) from a RopeSlice. I had a look at the implementation and it seems quite straightforward.

It takes a chunks iterator (also available in crop) and then just stores internally a reference to the current chunk's bytes and offset, changing them as advance and backtrack methods are called on the cursor. To advance the cursor, it delegates to the chunk iterator's next method, but in order to backtrack the cursor, it calls a prev() method on ropey's chunk iterator.

And this is where I got stuck. I tried to create a similar CropCursor in my program which implements the Cursor trait from the regex-cursor crate, but I don't know how to "move backwards" crop's Chunks iterator in a similar way to ropey's one.

Internally it delegates to Tree -> Leaves, and from there, I can't tell what is going on without studying crop's code in more detail.

Ropey also has a public method which can return a chunk at a byte offset, which seems to be used in Helix to match a regex on a partial document.

So I was hoping you could shed some light on what would be a sensible approach to implement Regex matching on a Rope. Should I simply .to_string() the rope, pass that to a Regex and call it a day? Or should I pursue this effort of implementing the Cursor trait from regex-cursor, in which case how do I go about moving crop's Chunk iterator backwards, and getting a chunk at a byte offset? Or maybe there is another way to do this, which I haven't considered?

Either way, thank you for your time and for your work on this library.

@noib3
Copy link
Collaborator

noib3 commented Oct 29, 2024

I'm open to adding those functionalities; however, I don't think it should be done via the Chunks iterator. In Rust, iterators are usually assumed to never yield the same item twice, and I fear that having both a prev() method and a DoubleEndedIterator implementation on the same struct would be quite confusing, especially for new users.

Instead, a better approach could be to add a Rope{Slice}::cursor() method which returns a Cursor struct. Its API would be similar to the Seek trait in std, with methods to seek to a given offset in the Rope, return the current chunk, advance/backtrack to the next/previous chunk, etc. What do you think?

@noib3 noib3 added the enhancement New feature or request label Oct 29, 2024
@muxcmux
Copy link
Author

muxcmux commented Oct 29, 2024

In Rust, iterators are usually assumed to never yield the same item twice, and I fear that having both a prev() method and a DoubleEndedIterator implementation on the same struct would be quite confusing, especially for new users.

Not gonna lie, when I first saw prev() on the Chunks iterator in ropey, I immediately assumed this was just a method that you get when implementing DoubleEndedIterator on your struct :D, but to be fair to ropey - they do a good job explaining reverse iteration capabilities of their iterators in the docs

What you are suggesting with the cursor sounds exactly like the missing piece needed to implement the Cursor trait from the regex-cursor lib.

@noib3
Copy link
Collaborator

noib3 commented Oct 30, 2024

Do you want to have a go at it? I might get around to it but can't guarantee when.

@muxcmux
Copy link
Author

muxcmux commented Oct 30, 2024

I was going to have a go at it, yeah. However, my Rust is pretty rubbish, so I will need some hand holding. It may also take me a while to work out how Crop internals work.

Will open a PR for comments at some point if you haven't done that before then.

Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants