Discussion: Cookbook #18

jeffrose · 2019-09-16T15:33:26Z

You may want to start a cookbook, similar to Ramda's that shows ways of solving common problems.

While I have not worked with arcsecond for very long, I've already encountered issues where the basic use cases were easy to solve but the more advanced use cases forced me to take a completely different approach.

For example, padded digits. This works fine for most use cases...
const paddedDigits = sequenceOf( [ possibly( char( '-' ) ), possibly( many( char( '0' ) ) ), digits ] );
... until your input is only 0, then it fails. What's the best way to resolve that problem? Use choice with a char( '0' ) or is there another approach that's better?

Another example is parsing query strings. A basic parser is easy to throw together using sepBy, letters, and digits, but if you want to fully support the spec, you need to take a complete different approach to meet all the edge cases.

Now that you have a robust library in place, I think developer guidance is key to adoption.

The text was updated successfully, but these errors were encountered:

francisrstokes · 2019-09-18T08:25:45Z

Hey Jeff - completely agree.

In the 2.x rewrite I did recently I tried to get somewhere close to this with the tutorial, but with this tech there are many gotchas and pieces of accumulated wisdom.

I would like to keep this issue open as a place to track what could make it's way into such a cookbook.

To answer your question about padded digits, I would probably go with a regex here:

const paddedDigits = regex(/^-?0*[0-9]+/);

However, an approach more similar to your original would look like:

const paddedDigits = sequenceOf([
  possibly(char('-')),
  many(char('0'))
    .map(x => x.join(''))
    .chain(zeroResult => {
      if (zeroResult !== '') {
        return possibly(digits).map(digitResult => {
          if (digitResult) {
            return zeroResult + digitResult;
          }
          return zeroResult;
        });
      }
      return digits;
    }),
]).map(x => x.join(''));

This is a bit gnarly, but uses .chain to check if any padding zeros were parsed, and then return either digits or possibly(digits) depending on that result. Basically any time your parsing is dependent on the result, you're going to need to use chain. Luckily regex is a whole beast that takes care of some of that so you can express it much simpler.

Hope this helps a bit!

jeffrose · 2019-09-18T13:21:03Z

Using regex felt like a cheat to me under the circumstances, since I was trying to familiarize myself with the API.
I can tell you that this did work for me...

const paddedDigits = choice( [
    sequenceOf( [ possibly( char( '-' ) ), possibly( many( char( '0' ) ) ), digits ] ),
    char( '0' ).map( ( value ) => [ null, null, value ] )
] );

The one advantage this has over using regex is it lets you make decisions on the padding size, etc. by breaking out the padding from the value.
A more robust solution could look like...

const
    optionalSign = possibly( choice( [ char( '-' ), char( '+' ) ] ) ),
    zero = char( '0' ),
    zeroCount = possibly( many( zero ) ).map( ( value ) => value.length ),
    paddedDigits = choice( [
        sequenceOf( [ optionalSign, zeroCount, digits ] ),
        zero.map( ( value ) => [ null, 0, value ] )
    ] );
console.log( paddedDigits.run( '0003' ).result );   // [ null, 3, '3' ]
console.log( paddedDigits.run( '-0010' ).result );  // [ '-', 2, '10' ]
console.log( paddedDigits.run( '+00005' ).result ); // [ '+', 4, '5' ]

IMax153 · 2019-10-11T23:20:02Z

Hello Francis,

First of all, let me say that I am thoroughly enjoying the LLJS series on YouTube. I am very much looking forward to the next installment of the VM series.

I have been working on implementing a parser for the Lua programming language with the arcsecond library, but have been running into several issues that I have been unable to solve by referring to your examples/tutorial. I apologize if my question is naive, but I have only just begun using the library.

The most prominent issue occurs in the situation where two choice parsers implement one another and continuously call one another, resulting in an infinite loop. I have read your response to the issue posed by @justgage, and ensured that I have not implemented any parsers in that manner.

Hopefully you can get an idea of what I am referring to from the example code below, which I tried to simplify as much as possible.

// Variable
export const variable = recursiveParser(() =>
  whitespaceSurrounded(
    choice([
      sequenceOf([expressionPrefix, dot, name]),
      sequenceOf([expressionPrefix, betweenBrackets(expression)]),
    ]),
  ),
);

// Expression Prefix
export const expressionPrefix = recursiveParser(() =>
  choice([self, globalVariable, betweenParentheses(expression), variable]),
);

Very curious to hear your thoughts.

francisrstokes · 2019-10-12T08:12:52Z

Hi @IMax153.
It's tricky to say for sure without context, so just a couple of general things to keep in mind:

With choice, the main thing you need to be careful with is that it only works properly with parsers that can fail. An example of a kind of parser that cannot fail is many(something) - if it doesn't find any somethings, then no problem - the result is an empty array and not a failure. That doesn't mix properly with choice because choice relies on a parser failing before it can try the next one.

The other thing is try to look out for places that you might be defining Left recursion. You might need to change how you write the parser to ensure it doesn't get into an infinite loop.

Hope one or both of those help.

jeffrose · 2019-11-19T19:10:55Z

Here is my recipe for parsing document.cookie. It complies with most forgiving cookie spec since it's reading and not writing.

import { char, everythingUntil, regex, sepBy, sequenceOf, str, takeRight } from 'arcsecond';
import { fromPairs } from 'ramda';

const
    validCharacters = /^[^;,\s]*/,
    percentEncoded = /(%[0-9A-Za-z]{2})+/g,
    percentDecode = ( value ) => value.replace( percentEncoded, decodeURIComponent ),
    equalsSign = char( '=' ),
    cookie = sequenceOf( [
        everythingUntil( equalsSign ).map( percentDecode ),
        takeRight( equalsSign )( regex( validCharacters ) ).map( percentDecode )
    ] ),
    cookies = sepBy( str( '; ' ) )( cookie ).map( fromPairs );

console.log( cookies.run( 'a=123; b=456; c=%20' ).result );
//=> { "a": "123", "b": "456", "c": " " }

@francisrstokes, I originally had takeRight( equalsSign )( everythingUntil( endOfInput ) ).map( percentDecode ) but that didn't work. Should it have?

francisrstokes · 2019-12-05T12:24:31Z

Hey Jeff,

Sorry for the late response.
This is really nice - and very succinct. Perfect for a cookbook.

I'm not sure I understand where the
takeRight( equalsSign )( everythingUntil( endOfInput ) ).map( percentDecode )
parser would go - does it replace the first item in the cookie sequenceOf?

jeffrose · 2019-12-05T13:31:21Z

Hello Francis,

No problem.

Sorry, I wasn't clear. Instead of takeRight( equalsSign )( regex( validCharacters ) ).map( percentDecode ), which is responsible for parsing the cookie value, I had takeRight( equalsSign )( everythingUntil( endOfInput ) ).map( percentDecode ).

I thought it would work because at that point, it should be parsing each individual cookie, e.g. a=123. everythingUntil( equalsSign ).map( percentDecode ) parses the cookie name, leaving =123 behind. I wanted it to take anything after the = and use that as the cookie value, making it an extremely forgiving parser. While introducing validCharacters is not necessarily a bad thing, the code would be slightly more succinct without it.

francisrstokes · 2019-12-06T09:19:07Z

Ah I see. The reason it doesn't work is because the end of input really is the actual end of input, so it will parse until the end of the string if there are more key/values to come.

francisrstokes · 2019-12-06T09:24:59Z

To get concrete and start putting this into the docs - I think we can create a new folder in the project directory for cookbook, with an index.md that gives a brief introduction to the idea (more real world examples that help people get to grips with parser combinators), and then link off to the different sections.

Then each example can be a markdown file, which introduces the parser, shows the code, and gives a breif overview of any concepts involved (if chain is used for example, why, and what does it do).

Would you be up for kicking that off with a PR @jeffrose? We can work together on the intro page and formatting and all that kind of stuff.

jeffrose · 2019-12-07T01:57:52Z

@francisrstokes, sounds good. I am currently out of town, but when I get back I will open a PR.

IMax153 · 2019-12-07T16:37:20Z

@francisrstokes @jeffrose I would also be interested in contributing examples. I don’t want to steal Jeff’s thunder though, so please let me know how I can help. Perhaps I could work through some other examples for the library? Is there anything you are interested in showcasing?

IMax153 · 2019-12-07T16:55:31Z

Two issues I continually ran across when I was trying to use this library for parsing Lua were operator precedence and left-recursion.

Perhaps I could look into how existing work could be used to handle those scenarios? I know it is notoriously difficult to handle these situations with top-down parsing, but it would be an interesting experiment.

francisrstokes · 2019-12-08T10:48:55Z

@IMax153 Indeed - they are some of the trickiest topics in parsing, and ones people tend to bump into pretty quickly. Having a reference for those would be great.

IMax153 · 2019-12-09T23:54:15Z

@francisrstokes So I created an example of parsing binary expressions that supports associativity and precedence (see below). Is your vision for these examples to be markdown documents that showcase how the library can be utilized to accomplish these tasks? Or would you prefer I submit a PR to add this example to the examples directory of the project?

import util from 'util';

import {
  between,
  char,
  choice,
  digits,
  optionalWhitespace,
  many,
  many1,
  recursiveParser,
  sequenceOf,
} from "../..";

// These parsers are based heavily on the work documented at
// https://blog.jcoglan.com/2017/07/07/precedence-and-associativity-in-recursive-descent/

// Utility method to print out the entire array of parsed expressions to the console
const printResult = array =>
  console.log(util.inspect(array, { maxArrayLength: null, depth: null }));

const whitespaceSurrounded = parser =>
  between(optionalWhitespace)(optionalWhitespace)(parser);

const betweenParentheses = parser =>
  between(whitespaceSurrounded(char('(')))(whitespaceSurrounded(char(')')))(parser);

const plus = char('+');
const minus = char('-');
const times = char('*');
const divide = char('/');

const binaryExpression = op => parser =>
    sequenceOf([
      whitespaceSurrounded(parser),
      many1(sequenceOf([whitespaceSurrounded(op), whitespaceSurrounded(parser)]))
    ])
    .map(([initialTerm, expressions]) =>
      // Flatten the expressions
      [initialTerm, ...expressions].reduce((acc, curr) =>
        // Reduce the array into a left-recursive tree
        Array.isArray(curr) ? [curr[0], acc, curr[1]] : curr
      )
    );

const factor = recursiveParser(() => choice([digits, betweenParentheses(expression)]));
const term = recursiveParser(() => choice([multiplicationOrDivision, factor]));
const expression = recursiveParser(() => choice([additionOrSubtraction, term]));
const additionOrSubtraction = binaryExpression(choice([plus, minus]))(term);
const multiplicationOrDivision = binaryExpression(choice([times, divide]))(factor);

// Handle precedence of multiplication and division over addition and subtraction
printResult(many(expression).run('9 + 5 - 4 * 4 / 3').result);
// Result: [ [ '-', [ '+', '9', '5' ], [ '/', [ '*', '4', '4' ], '3' ] ] ]

// Handle precedence of expressions in parentheses
printResult(many(expression).run('9 + (5 - 4) * (4 / 3)').result)
// Result: [ [ '+', '9', [ '*', [ '-', '5', '4' ], [ '/', '4', '3' ] ] ] ]

IMax153 · 2019-12-13T21:33:47Z

Hey guys - just wanted to check ahead of Jeff’s pull request to create the Cookbook file. Is the example above succinct enough for the cookbook?

francisrstokes · 2019-12-14T09:49:25Z

Hey Max - yes I believe so. I merged Jeff's example this morning, so if you want to make a PR, you can use what is there now in master to add your example.

Two things: I would strip out anything not directly related to your example though, just to cut down on noise - so the util/printResult stuff.
And make sure there is couple of sentences explaining the problem and the approach, to put the code in context.

And of course thanks for taking the interest and time to do this 👍

crapthings · 2021-11-27T11:49:31Z

Hello, how to match newline \n?

I want to match something like this

   \n hello \n kitty        \n  test     \n something

the whitespace can be everywhere, use newline \n to sep value

I've tried

const A = require('arcsecond')

const test = A.sepBy(A.char('\n'))(A.between(A.optionalWhitespace)(A.optionalWhitespace)(A.letters))

const result = test.run('big \n list \n of \n                  words')

console.dir(result, { depth: null })

but it won't work

francisrstokes · 2021-11-27T13:50:38Z

Hey @crapthings - it's not working because the newlines you want to use to separate are being caught by the optionalWhitespace. The easy solution is to create an optional whitespace parser that doesn't include newlines. Something like this:

const customWhitespace = A.possibly(A.regex(/^( |\t|\r)*/));

francisrstokes changed the title ~~Cookbook~~ Discussion: Cookbook Dec 10, 2019

jeffrose added a commit to jeffrose/arcsecond that referenced this issue Dec 11, 2019

Initial cookbook draft per francisrstokes#18

410a8b7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion: Cookbook #18

Discussion: Cookbook #18

jeffrose commented Sep 16, 2019

francisrstokes commented Sep 18, 2019

jeffrose commented Sep 18, 2019 •

edited

Loading

IMax153 commented Oct 11, 2019

francisrstokes commented Oct 12, 2019

jeffrose commented Nov 19, 2019 •

edited

Loading

francisrstokes commented Dec 5, 2019 •

edited

Loading

jeffrose commented Dec 5, 2019

francisrstokes commented Dec 6, 2019

francisrstokes commented Dec 6, 2019

jeffrose commented Dec 7, 2019

IMax153 commented Dec 7, 2019

IMax153 commented Dec 7, 2019

francisrstokes commented Dec 8, 2019

IMax153 commented Dec 9, 2019

IMax153 commented Dec 13, 2019

francisrstokes commented Dec 14, 2019

crapthings commented Nov 27, 2021

francisrstokes commented Nov 27, 2021

Discussion: Cookbook #18

Discussion: Cookbook #18

Comments

jeffrose commented Sep 16, 2019

francisrstokes commented Sep 18, 2019

jeffrose commented Sep 18, 2019 • edited Loading

IMax153 commented Oct 11, 2019

francisrstokes commented Oct 12, 2019

jeffrose commented Nov 19, 2019 • edited Loading

francisrstokes commented Dec 5, 2019 • edited Loading

jeffrose commented Dec 5, 2019

francisrstokes commented Dec 6, 2019

francisrstokes commented Dec 6, 2019

jeffrose commented Dec 7, 2019

IMax153 commented Dec 7, 2019

IMax153 commented Dec 7, 2019

francisrstokes commented Dec 8, 2019

IMax153 commented Dec 9, 2019

IMax153 commented Dec 13, 2019

francisrstokes commented Dec 14, 2019

crapthings commented Nov 27, 2021

francisrstokes commented Nov 27, 2021

jeffrose commented Sep 18, 2019 •

edited

Loading

jeffrose commented Nov 19, 2019 •

edited

Loading

francisrstokes commented Dec 5, 2019 •

edited

Loading