-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion: Cookbook #18
Comments
Hey Jeff - completely agree. In the 2.x rewrite I did recently I tried to get somewhere close to this with the tutorial, but with this tech there are many gotchas and pieces of accumulated wisdom. I would like to keep this issue open as a place to track what could make it's way into such a cookbook. To answer your question about padded digits, I would probably go with a regex here: const paddedDigits = regex(/^-?0*[0-9]+/); However, an approach more similar to your original would look like: const paddedDigits = sequenceOf([
possibly(char('-')),
many(char('0'))
.map(x => x.join(''))
.chain(zeroResult => {
if (zeroResult !== '') {
return possibly(digits).map(digitResult => {
if (digitResult) {
return zeroResult + digitResult;
}
return zeroResult;
});
}
return digits;
}),
]).map(x => x.join('')); This is a bit gnarly, but uses Hope this helps a bit! |
Using const paddedDigits = choice( [
sequenceOf( [ possibly( char( '-' ) ), possibly( many( char( '0' ) ) ), digits ] ),
char( '0' ).map( ( value ) => [ null, null, value ] )
] ); The one advantage this has over using const
optionalSign = possibly( choice( [ char( '-' ), char( '+' ) ] ) ),
zero = char( '0' ),
zeroCount = possibly( many( zero ) ).map( ( value ) => value.length ),
paddedDigits = choice( [
sequenceOf( [ optionalSign, zeroCount, digits ] ),
zero.map( ( value ) => [ null, 0, value ] )
] );
console.log( paddedDigits.run( '0003' ).result ); // [ null, 3, '3' ]
console.log( paddedDigits.run( '-0010' ).result ); // [ '-', 2, '10' ]
console.log( paddedDigits.run( '+00005' ).result ); // [ '+', 4, '5' ] |
Hello Francis, First of all, let me say that I am thoroughly enjoying the LLJS series on YouTube. I am very much looking forward to the next installment of the VM series. I have been working on implementing a parser for the Lua programming language with the arcsecond library, but have been running into several issues that I have been unable to solve by referring to your examples/tutorial. I apologize if my question is naive, but I have only just begun using the library. The most prominent issue occurs in the situation where two choice parsers implement one another and continuously call one another, resulting in an infinite loop. I have read your response to the issue posed by @justgage, and ensured that I have not implemented any parsers in that manner. Hopefully you can get an idea of what I am referring to from the example code below, which I tried to simplify as much as possible. // Variable
export const variable = recursiveParser(() =>
whitespaceSurrounded(
choice([
sequenceOf([expressionPrefix, dot, name]),
sequenceOf([expressionPrefix, betweenBrackets(expression)]),
]),
),
);
// Expression Prefix
export const expressionPrefix = recursiveParser(() =>
choice([self, globalVariable, betweenParentheses(expression), variable]),
); Very curious to hear your thoughts. |
Hi @IMax153. With The other thing is try to look out for places that you might be defining Left recursion. You might need to change how you write the parser to ensure it doesn't get into an infinite loop. Hope one or both of those help. |
Here is my recipe for parsing import { char, everythingUntil, regex, sepBy, sequenceOf, str, takeRight } from 'arcsecond';
import { fromPairs } from 'ramda';
const
validCharacters = /^[^;,\s]*/,
percentEncoded = /(%[0-9A-Za-z]{2})+/g,
percentDecode = ( value ) => value.replace( percentEncoded, decodeURIComponent ),
equalsSign = char( '=' ),
cookie = sequenceOf( [
everythingUntil( equalsSign ).map( percentDecode ),
takeRight( equalsSign )( regex( validCharacters ) ).map( percentDecode )
] ),
cookies = sepBy( str( '; ' ) )( cookie ).map( fromPairs );
console.log( cookies.run( 'a=123; b=456; c=%20' ).result );
//=> { "a": "123", "b": "456", "c": " " } @francisrstokes, I originally had |
Hey Jeff, Sorry for the late response. I'm not sure I understand where the |
Hello Francis, No problem. Sorry, I wasn't clear. Instead of I thought it would work because at that point, it should be parsing each individual cookie, e.g. |
Ah I see. The reason it doesn't work is because the end of input really is the actual end of input, so it will parse until the end of the string if there are more key/values to come. |
To get concrete and start putting this into the docs - I think we can create a new folder in the project directory for cookbook, with an Then each example can be a markdown file, which introduces the parser, shows the code, and gives a breif overview of any concepts involved (if chain is used for example, why, and what does it do). Would you be up for kicking that off with a PR @jeffrose? We can work together on the intro page and formatting and all that kind of stuff. |
@francisrstokes, sounds good. I am currently out of town, but when I get back I will open a PR. |
@francisrstokes @jeffrose I would also be interested in contributing examples. I don’t want to steal Jeff’s thunder though, so please let me know how I can help. Perhaps I could work through some other examples for the library? Is there anything you are interested in showcasing? |
Two issues I continually ran across when I was trying to use this library for parsing Lua were operator precedence and left-recursion. Perhaps I could look into how existing work could be used to handle those scenarios? I know it is notoriously difficult to handle these situations with top-down parsing, but it would be an interesting experiment. |
@IMax153 Indeed - they are some of the trickiest topics in parsing, and ones people tend to bump into pretty quickly. Having a reference for those would be great. |
@francisrstokes So I created an example of parsing binary expressions that supports associativity and precedence (see below). Is your vision for these examples to be markdown documents that showcase how the library can be utilized to accomplish these tasks? Or would you prefer I submit a PR to add this example to the examples directory of the project? import util from 'util';
import {
between,
char,
choice,
digits,
optionalWhitespace,
many,
many1,
recursiveParser,
sequenceOf,
} from "../..";
// These parsers are based heavily on the work documented at
// https://blog.jcoglan.com/2017/07/07/precedence-and-associativity-in-recursive-descent/
// Utility method to print out the entire array of parsed expressions to the console
const printResult = array =>
console.log(util.inspect(array, { maxArrayLength: null, depth: null }));
const whitespaceSurrounded = parser =>
between(optionalWhitespace)(optionalWhitespace)(parser);
const betweenParentheses = parser =>
between(whitespaceSurrounded(char('(')))(whitespaceSurrounded(char(')')))(parser);
const plus = char('+');
const minus = char('-');
const times = char('*');
const divide = char('/');
const binaryExpression = op => parser =>
sequenceOf([
whitespaceSurrounded(parser),
many1(sequenceOf([whitespaceSurrounded(op), whitespaceSurrounded(parser)]))
])
.map(([initialTerm, expressions]) =>
// Flatten the expressions
[initialTerm, ...expressions].reduce((acc, curr) =>
// Reduce the array into a left-recursive tree
Array.isArray(curr) ? [curr[0], acc, curr[1]] : curr
)
);
const factor = recursiveParser(() => choice([digits, betweenParentheses(expression)]));
const term = recursiveParser(() => choice([multiplicationOrDivision, factor]));
const expression = recursiveParser(() => choice([additionOrSubtraction, term]));
const additionOrSubtraction = binaryExpression(choice([plus, minus]))(term);
const multiplicationOrDivision = binaryExpression(choice([times, divide]))(factor);
// Handle precedence of multiplication and division over addition and subtraction
printResult(many(expression).run('9 + 5 - 4 * 4 / 3').result);
// Result: [ [ '-', [ '+', '9', '5' ], [ '/', [ '*', '4', '4' ], '3' ] ] ]
// Handle precedence of expressions in parentheses
printResult(many(expression).run('9 + (5 - 4) * (4 / 3)').result)
// Result: [ [ '+', '9', [ '*', [ '-', '5', '4' ], [ '/', '4', '3' ] ] ] ] |
Hey guys - just wanted to check ahead of Jeff’s pull request to create the Cookbook file. Is the example above succinct enough for the cookbook? |
Hey Max - yes I believe so. I merged Jeff's example this morning, so if you want to make a PR, you can use what is there now in master to add your example. Two things: I would strip out anything not directly related to your example though, just to cut down on noise - so the util/printResult stuff. And of course thanks for taking the interest and time to do this 👍 |
Hello, how to match newline \n? I want to match something like this
the whitespace can be everywhere, use newline \n to sep value I've tried
but it won't work |
Hey @crapthings - it's not working because the newlines you want to use to separate are being caught by the const customWhitespace = A.possibly(A.regex(/^( |\t|\r)*/)); |
You may want to start a cookbook, similar to Ramda's that shows ways of solving common problems.
While I have not worked with arcsecond for very long, I've already encountered issues where the basic use cases were easy to solve but the more advanced use cases forced me to take a completely different approach.
For example, padded digits. This works fine for most use cases...
const paddedDigits = sequenceOf( [ possibly( char( '-' ) ), possibly( many( char( '0' ) ) ), digits ] );
... until your input is only
0
, then it fails. What's the best way to resolve that problem? Usechoice
with achar( '0' )
or is there another approach that's better?Another example is parsing query strings. A basic parser is easy to throw together using
sepBy
,letters
, anddigits
, but if you want to fully support the spec, you need to take a complete different approach to meet all the edge cases.Now that you have a robust library in place, I think developer guidance is key to adoption.
The text was updated successfully, but these errors were encountered: