Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some things that crash or give the wrong result #4

Open
NicholasSterling opened this issue May 13, 2022 · 10 comments
Open

Some things that crash or give the wrong result #4

NicholasSterling opened this issue May 13, 2022 · 10 comments

Comments

@NicholasSterling
Copy link
Contributor

I love this project, and very much appreciate the effort that went into it. Kudos!

I'm not sure whether this is intended for actual use or just a simple teaching exercise, but I thought I'd mention a few things that failed (for some definition of "fail", e.g. crash or do something unexpected).

12 -- expects an outer set of parens

(4) -- evals to (4), not 4; (+ 3 1) evals to 4

(print "foo) -- prints foo)

(") -- evals to ())

Are you open to pull requests? There are also a few things that could be simplified.

@vishpat
Copy link
Owner

vishpat commented May 19, 2022

@NicholasSterling, please create a pull request against the main branch.

@NicholasSterling
Copy link
Contributor Author

Done. The PR does not attempt to fix any of the errors, just suggested simplifications.

By the way, the lexer converts the UTF-8 String to a Vec and then processes the characters one at a time, from the beginning, removing each character as it goes. Removing a character from the front of a Vec requires moving the rest of them, so processing an N-char input is N^2.

@NicholasSterling
Copy link
Contributor Author

NicholasSterling commented May 24, 2022

@vishpat I am pretty sure you can fix the N^2 lexing by getting an Iterator for the input string and calling peekable() so that you can peek at the next element. I could do this for you, but since my other PR is just sitting there and you are making further changes to the lexer, I'll wait for you to resolve all of that.

If you decide to do this yourself, I'm suggesting getting rid of the Vec in tokenize() and doing

let mut iter = input.chars().peekable();

Then when you need characters, you can do

while let Some(ch) = iter.next() { ... }
or just peek at the next char withiter.peek().

If you want me to do it, just give me the all-clear when the lexer dust has settled.

@jcubic
Copy link

jcubic commented May 31, 2022

(4) should throw an error, applying nonfunction.

@vishpat
Copy link
Owner

vishpat commented May 31, 2022

@NicholasSterling consider the lexer dust settled. Please open a PR

@NicholasSterling
Copy link
Contributor Author

@vishpat OK, thanks -- will do so by the end of the weekend.

@NicholasSterling
Copy link
Contributor Author

OK, that was too optimistic. I'll give an update in a few days.

@NicholasSterling
Copy link
Contributor Author

NicholasSterling commented Jun 20, 2022

I'm overdue for an update. I decided to do something more sophisticated, with these goals:

  1. Fix the N^2 problem.
  2. Only allocate memory when necessary (e.g. not for keywords or numbers).
  3. Make line and column number info available, for diagnostics.

I looked in crates.io for something that would help, and Tokenator looked promising, but it looked a bit clumsy to use, so I decided to do it a bit differently. I have been working on a new crate that helps one write tokenizers. It appears to be working, but I need to do more testing.

@NicholasSterling
Copy link
Contributor Author

@vishpat After much thrashing over the API, the tokenizer crate I have been working on (token-iter) is ready. If you are OK with it, I would like to make lisp-rs use it, let you check it out, and then publish the crate. If you would rather not use a separate crate for tokenization, I understand, but please let me know either way.

https://github.com/NicholasSterling/token-iter/blob/main/src/lib.rs

@NicholasSterling
Copy link
Contributor Author

@vishpat Have you had a chance to look at the token-iter crate? Are you OK with depending on an external crate for lexical analysis?
https://docs.rs/token-iter/latest/token_iter/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants