Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #3921 & #2342: inline function without parentheses used in condition, inline 'else' #4838

Merged
merged 7 commits into from
Jan 29, 2018

Conversation

zdenko
Copy link
Collaborator

@zdenko zdenko commented Dec 31, 2017

Fixes #3921 and #2342.
Compile fails when inline function is used in condition (control flow): if, unless, while, until, switch.

Example cases:

# [1]
if f (a) -> a is 1
  'ok'

# [2]
res = if f (a) -> a is 1 then 'ok' else 'ko'

# [3]
while do (s = i) -> a -= s
  foo(a)

# [4]
i = 1
a = switch do (m = 2) -> i * m
  when 2 then "two"
  when 1 then "one"
  else "none"

[1], [2], [3], [4]

The main cause of the issue is incorrect positions of INDENT and OUTDENT tag pair inserted in Rewriter::normalizeLines.
Example:

if f -> a
  'ok'

###
IF/if IDENTIFIER/f ->/-> INDENT/2 IDENTIFIER/a INDENT/2 STRING/'ok' OUTDENT/2 OUTDENT/2
                         ^                                                    ^
                         *                                                    *
###

The second OUTDENT should be placed between a and INDENT.
Besides that, @tagPostfixConditionals also converts IF into POST_IF tag in such cases.

At first, I’ve tried to adjust the conditions in @normalizeLines to correct the INDENT/OUTDENT position. I managed to get it work for cases where newline follows inline function, but I couldn’t solve cases with inline condition (if -> a then) which are too ambiguous.
So, at the end, I decided to add new method @addParensToConditions which wraps the function in parentheses.

The better solution would be controlling the output in grammar and nodes, but currently I don’t see it as an option, since Expression used in grammar is too broad, which means that rules for If, While and Switch nodes would have to be split into more detailed rules, e.g. for Value, Code, Assign, …

@GeoffreyBooth
Copy link
Collaborator

Why wouldn’t extending the grammar be an option? When I look at Expression, I see only this:

  Expression: [
    o 'Value'
    o 'Code'
    o 'Operation'
    o 'Assign'
    o 'If'
    o 'Try'
    o 'While'
    o 'For'
    o 'Switch'
    o 'Class'
    o 'Throw'
    o 'Yield'
  ]

Several of these don’t make much sense to allow inside an if conditional (throw, yield, maybe if, class). But even if we allowed all of them, couldn’t the grammar distinguish between items that require being wrapped in parentheses versus items that don’t? So something like:

IfBlock: [
  o 'IF SimpleConditional Block',                    -> new If $2, $3, type: $1
  o 'IfBlock ELSE IF SimpleConditional Block',       -> $1.addElse LOC(3,5) new If $4, $5, type: $3
  o 'IF ( ComplexConditional ) Block',               -> new If $3, $4, type: $1
  o 'IfBlock ELSE IF ( ComplexConditional ) Block',  -> $1.addElse LOC(3,7) new If $5, $7, type: $3
]

And then we’d copy Expression into two new blocks, SimpleConditional and ComplexConditional, based on whether the conditional needs to be wrapped in parentheses or not. Code, for example, would need to be wrapped in parentheses. I don’t think it’s a requirement that ambiguous things like inline functions need to be allowed in conditionals without being wrapped in parentheses.

@GeoffreyBooth
Copy link
Collaborator

Also I wonder if we could fix any of the other if-related bugs at the same time: #3933, #3921, #3441, #2342.

@vendethiel
Copy link
Collaborator

Fixing this in the grammar would be much better yeah.

WRT other if bugs, see satyr/coco#127 as it has quite a bit of discussion

@zdenko
Copy link
Collaborator Author

zdenko commented Jan 4, 2018

OK, I bite the bullet, discarded the @addParensToConditions method from the Rewriter and implement new rules in the grammar instead.
There are quite a few changes, so I'll try to explain the problem and solution.

As said before, the problem with the inline functions being used in conditional and control flow (If, While, For, Switch, ...), was an improper implicit placement of the INDENT and OUTDENT tags in the Rewriter::normalizeLines.

For example: f -> a would be rewritten to

f ->
  a
### Tokens
IDENTIFIER -> INDENT IDENTIFIER OUTDENT 
###

And the final output from the rewriter is

f (-> 
   a
)
### Tokens
IDENTIFIER CALL_START -> INDENT IDENTIFIER OUTDENT CALL_END
###

Grammar breaks this expression into

Value
 Invocation 
   Value 
     Identifier "f"
   Arguments
     ArgList
       Code
         FuncGlyph "->"
         Block "INDENT a OUTDENT"

And, in this example if f -> a then b, the code will be broken into

if f ( ->
      a
      b
   )
If
  Value
   Invocation 
     Value 
       Identifier "f"
     Arguments
       ArgList
         Code
           FuncGlyph "->"
           Block "INDENT a INDENT b OUTDENT OUTDENT"

Essentially, the problem is misinterpreted inline function (-> a) when used as an argument to function call.

The initial idea for solving this, was adjusting @normalizeLine and place INDENT and OUTDENT in proper places, or adding the new method which will wrap f -> a in braces.

The former would mean adding more condition rules when scanning tokens, the latter added just another unnecessary loop over the tokens.

By looking closer at grammar it's clear that the main reason @normalizeLine inserts INDENT after the ->, is because the grammar rule for the Code expects Block which is basically indented Body.
So, to solve this we need a grammar rule which will also accept -> Expression.
This will enable writing inline functions without braces.

if do -> a then b

if do -> 'wrong' then do -> 'right'

c if a = f -> b

for a in do -> b
  a
  
while do (a = 1) -> f a
  c   
 
while b = f -> a when a isnt 3
  foo b
 
c = switch b = f -> a
  when 2 then "two"
  when 1 then "one"
  else "none"

c = switch
  when do -> a 
      'ok'
  when do -> b
      'ko'
  

Besides control flow, the same issue also appears when used in ranges, slices and array definition

a = [do -> b...5]
a = arr[do -> b...5]
arr = [1, 2, 3, do -> a, 5, 6]

Note that examples above works in current master when inline function is the last element:

a = [1...do -> b]
a = arr[1...do -> b]
arr = [1, 2, 3, do -> a]

Besides new grammar rules, I also changed existing @normalizeLines in order to avoid inserting INDENT / OUTDENT when the inline function is detected with control flow tags or inside the array, range, and slice.

This PR is still WIP as there are still some cases (especially Operation) which need refinements and testing.

@zdenko
Copy link
Collaborator Author

zdenko commented Jan 4, 2018

@GeoffreyBooth
#3933 and #3441 are not part of this PR. I'll have a PR for the #3441 ready soon.
However, #2342 might be fixed once I finished all changes in grammar.

@vendethiel
Copy link
Collaborator

Would I be right saying this could potentially fix a -> b, c to mean a (-> b), c then? As per the previously linked ticket.

@zdenko
Copy link
Collaborator Author

zdenko commented Jan 4, 2018

@vendethiel which ticket exactly? In your example a -> b, c is changed in Rewriter::addImplicitBracesAndParens, which I also intend to refactor later.

I just did a quick test locally (not committed yet) and a -> b, c compiles to

a(function() {
  return b;
}, c);

and this case f a -> b, c to

f(a(function() {
  return b;
}, c));

Is this expected?

@vendethiel
Copy link
Collaborator

This ticket: satyr/coco#127

I'd say the compilation is correct. It's "unexpected", because we currently parse it as a (-> b, c) which is an error.

The question I have then is: did this change the meaning of these cases:

if true then if false
   true

Does this code now compile?

switch
when a then if b then c else d

@vendethiel
Copy link
Collaborator

I know, it's a bit broader than what the ticket attempts to do, but it's one big can of worms..

#2342

@zdenko
Copy link
Collaborator Author

zdenko commented Jan 4, 2018

Yeah, I was looking at those also and I think this PR could also cover such cases.

@zdenko zdenko changed the title Fix #3921: inline function without parentheses used in condition Fix #3921 & #2342: inline function without parentheses used in condition, inline 'else' Jan 7, 2018
@@ -485,6 +485,293 @@ test "#4267: lots of for-loops in the same scope", ->
"""
ok CoffeeScript.eval(code)

# Test for issue #2342: Lexer: Inline `else` binds to wrong `if`/`switch`
test "issue 2343: if / then / if / then / else", ->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very minor, but elsewhere in the tests when we reference an issue we do it as test "#2343: ....

@zdenko
Copy link
Collaborator Author

zdenko commented Jan 7, 2018

The last commit includes the fix for #2342.

 # if a then if b then c else d

if (a) {
  if (b) {
    c;
  } else {
    d;
  }
}
# switch e
#   when f then if g then h else i

switch (e) {
  case f:
    if (g) {
      h;
    } else {
      i;
    }
}

I've tried to fix this in grammar and nodes by having a separate set of rules for inline if: if a then b else c. The main idea was (is) to keep then, i.e. not replacing it with INDENT/OUTDENT.
It worked to some extent, but at the end, I was walking in circles of ambiguous cases.
From my understanding, the grammar solution would require having rules for IF Block and IF Line, which ultimately means a major reworking of the Expression.
So, at least for now, I solved this by adjusting Rewriter::normalizeLines where INDENT/OUTDENT are inserted.

@GeoffreyBooth
Copy link
Collaborator

The more I learn about this codebase, the more I feel like rewriter.coffee really never should’ve existed in the first place; that it’s just a long series of hacks to get around not achieving what we want in the grammar. Alas, it’s more than a little late now. I think that’s what CoffeeScriptRedux was aiming to fix.

Re if a then if b then c else d, why is this not ambiguous? Like, how would I know that the else goes with if b and not if a? I surmise that you’re introducing a rule that else always binds with the closest prior then, but will this potentially break old code that expected otherwise?

@Inve1951
Copy link
Contributor

Inve1951 commented Jan 7, 2018

binding to closest if is how other languages handle dangling else.
i'd go as far as calling it the expected and most logical behavior.

in cases where you want if a then (if b then c) else d you can use parens or use multiple lines and indent appropriately. imo that's sufficient.

another possibility could be using a semicolon: if a then if b then c; else d
i don't think that's a good idea tho as it's easy to miss and harder to grasp the meaning.

@GeoffreyBooth
Copy link
Collaborator

I think the behavior proposed here (binding to closest then) is the best option; second-best would be throwing an error that it’s ambiguous. I’m just asking adopting this rule now is a breaking change.

@zdenko
Copy link
Collaborator Author

zdenko commented Jan 7, 2018

Binding else to closest then is the way to go IMHO.
Note that this PR fixes cases like if a then if b then d else c, but we should also take care of this case if a then if b then c else d else e.
I would expect this output

if (a) {
  if (b) {
    return c
  } else {
    return d
  }
} else {
  return e
}

The more I learn about this codebase, the more I feel like rewriter.coffee really never should’ve existed in the first place

Count me in. I'm working on improved grammar (not part of this PR), and I think it will be possible to get rid of most of the current logic in the rewriter.

@Inve1951
Copy link
Contributor

Inve1951 commented Jan 7, 2018

i assume you flipped c and d by accident.
besides that i agree with the output.

@GeoffreyBooth
Copy link
Collaborator

Okay, but is this a breaking change?

@zdenko
Copy link
Collaborator Author

zdenko commented Jan 7, 2018

@Inve1951 yes, c and d were flipped. I corrected the comment.
@GeoffreyBooth I would say this is a breaking change.
Any existing code with if false then if false then 1 else 2 that expects the result to be 2 will now get undefined.

@vendethiel
Copy link
Collaborator

The more I learn about this codebase, the more I feel like rewriter.coffee really never should’ve existed in the first place; that it’s just a long series of hacks to get around not achieving what we want in the grammar. Alas, it’s more than a little late now. I think that’s what CoffeeScriptRedux was aiming to fix.

Yes, pretty much everyone agrees on this. Do note, tho, that CSR has some minor rewriting (notably regexes).

Re if a then if b then c else d, why is this not ambiguous? Like, how would I know that the else goes with if b and not if a? I surmise that you’re introducing a rule that else always binds with the closest prior then, but will this potentially break old code that expected otherwise?

It's not ambiguous, but i'm afraid this PR changing its meaning (tho for the best) is a huge breaking change.

@GeoffreyBooth
Copy link
Collaborator

I’m okay with a breaking change that fixes a bug, like when we changed chaining in 1.12.7 to go from the start of a line rather than the last function in the line. But when we make such a change, we need to be sure that the prior behavior is a bug, and clearly seems to be such. I think this qualifies; what do you all think?

I would disagree that this is a “huge” breaking change; to me, relying on multiple thens in the same line smacks of ambiguity, so I would hope it wouldn’t be common (or that people would add parentheses to make their intentions clear). It’s only one data point, but we didn’t need to change anything across the CoffeeScript codebase as a result of this change, did we?

@GeoffreyBooth
Copy link
Collaborator

Survey of the crowd (including @jashkenas, @lydell, @connec, others not already on this thread):

A bunch of bugfix PRs have been merged in recently, some of them substantial changes, so I’m thinking the next release should be 2.2.0. That should give people enough of a hint that it’s not just a small patch. In addition, this PR is technically a breaking change: else previously bound to the earliest then but now binds to the latest then, which makes much more sense. So code like these are changing:

if no then if yes then alert 1 else alert 2
OldNew
if (false) {
  if (true) {
    alert(1);
  }
} else {
  alert(2);
}
if (false) {
  if (true) {
    alert(1);
  } else {
    alert(2);
  }
}
Alerted 2Nothing alerted
switch 1
  when 1 then if yes then alert 1 else alert 2
OldNew
switch (1) {
  case 1:
    if (false) {
      alert(1);
    }
    break;
  default:
    alert(2);
}
switch (1) {
  case 1:
    if (false) {
      alert(1);
    } else {
      alert(2);
    }
}
Nothing alertedAlerts 2

As you can see in the output, in the first example the else binds to the first if instead of the second. The latter makes more sense. I don’t think the docs discuss what to expect in a case like this; if anything, I would’ve expected the compiler to throw an error complaining about ambiguity.

In the second example, the else previously bound to the when (creating a default: block) whereas now it binds to the if. Again, I think the new behavior is more what users would expect.

So the question is, what do we think of this change?

  1. It’s tiny. Slip it into 2.2.0.
  2. It’s big enough to be worth splitting into its own release; make 2.2.1 just this PR.
  3. It’s big enough to be worth splitting into its own release, and it’s not a patch; make 2.3.0 just this PR.
  4. It’s drastic! Wait for 3.0.0!

I’m inclined for option 2, releasing 2.2.1 as just this PR. This change feels to me a lot like when we fixed chaining between 1.12.6 and 1.12.7; technically it was a breaking change, but it corrected a bug. Though at least in that case the intended behavior was in the docs; this is changing undocumented behavior. On the other hand, 2.x is a lot less settled than 1.12.6 was, so changing undocumented behavior at this point should bother a lot fewer people.

@zdenko
Copy link
Collaborator Author

zdenko commented Jan 19, 2018

The last commit adds support for nested else in a single line.

// if a then if b then c else d  else e

if (a) {
  if (b) {
    c;
  } else {
    d;
  }
} else {
  e;
}

// if a then if b then if c then d else e else f

if (a) {
  if (b) {
    if (c) {
      d;
    } else {
      e;
    }
  } else {
    f;
  }
}

// if a then if b then if c then d else if e then f else if g then h else i else j else k

if (a) {
  if (b) {
    if (c) {
      d;
    } else if (e) {
      f;
    } else if (g) {
      h;
    } else {
      i;
    }
  } else {
    j;
  }
} else {
  k;
}

@GeoffreyBooth
Copy link
Collaborator

I’m thinking of just merging this in and releasing a big 2.2.0. Any objections? The minor (as opposed to batch) version bump should tell people that there might be meaningful changes in the release, as there very well could be for some people.

@zdenko This looks good to me. Anything else you have planned for this PR, or is it ready? And any other PRs you’re working on that should be slipped into 2.2.0?

@zdenko
Copy link
Collaborator Author

zdenko commented Jan 28, 2018

@GeoffreyBooth I think it's ready and can be merged. I'm still working on improved grammar which should take care of then and else (and some other) tags without Rewriter, but it's not yet ready for PR.
I have also other PRs with enhancements and new features proposals on the way, but those should go in the next release.

@GeoffreyBooth GeoffreyBooth merged commit c804087 into jashkenas:master Jan 29, 2018
@GeoffreyBooth
Copy link
Collaborator

Great, thanks for all your hard work!

@GeoffreyBooth GeoffreyBooth mentioned this pull request Feb 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Grammar: Inline function without parentheses used as if condition fails to compile
4 participants