Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rethink gensym format #1531

Closed
gilch opened this issue Mar 13, 2018 · 9 comments · Fixed by #1773
Closed

Rethink gensym format #1531

gilch opened this issue Mar 13, 2018 · 9 comments · Fixed by #1773

Comments

@gilch
Copy link
Member

gilch commented Mar 13, 2018

Now that #1517 has landed, we might want to rethink the gensym format.
_;let|1235 is a lot more readable than

_hyx_ΔsemicolonΔletΔvertical_lineΔ1235

As mentioned in #1458, we could improve this a lot by changing the | to a - or _.

=> (mangle "_;let-1235")
'_hyx_ΔsemicolonΔlet_1235'

The ΔsemicolonΔ part is harder to deal with. We don't want gensyms to be lexically valid symbols, which doesn't leave a lot of options. Symbols like _()let-1235 would be even longer mangled. But we could at least separate it from the gensym name with another -.

=> (mangle "_;-let-1235")
'_hyx_ΔsemicolonΔ_let_1235'

I'm not sure if we can do better.

@Kodiologist
Copy link
Member

That last option seems okay to me.

We don't want gensyms to be lexically valid symbols

Consider that we've already implicitly staked claims on names beginning with hy_ (and hyx_, in a different way) even though they're lexically valid. It doesn't seem too unreasonable to use lexically valid gensyms, too, so long as they begin with hy_ or _hy_ or so.

@Kodiologist
Copy link
Member

Kodiologist commented Apr 5, 2018

What do you think of something like the following?

  • (gensym)_hy_G_1235
  • (gensym "foo")_hy_Gfoo_1235

@gilch
Copy link
Member Author

gilch commented Apr 5, 2018

But then how would (gensym '!) look? Is it going to get double-mangled?
_hyx_hy_GXexclamation_markX_1235?

The double prefix looks weird. I think we have to re-use the _hyx_ prefix and integrate it into the mangling better. Maybe just X-quote the gensym suffix.
(gensym '!) to _hyx_Xexclamation_markXX1235X
(gensym 'foo) to _hyx_fooX1236X
(gensym) to _hyx_X1237X

But then the gensym name gets lost in the middle. Maybe put the gensym suffix right after the _hyx_.
(gensym '!) to _hyx_X1235XXexclamation_markX
(gensym 'foo) to _hyx_X1236Xfoo
(gensym) to _hyx_X1237X

And maybe add another underscore for legibility.
(gensym '!) to _hyx_X1235X_Xexclamation_markX
(gensym 'foo) to _hyx_X1236X_foo
(gensym) to _hyx_X1237X_


The point of using gensyms is to guarantee that accidental capture can't happen.

I did not realize before, but the mangling makeover gives us a lexically-valid alias for any lexically-invalid gensym we might come up with. E.g. even if we had _;let-1235 in our macroexpansion, _hyx_XsemicolonXlet_1235 would still capture it.

So it's now pointless to make gensyms lexically invalid, which means we're free to make them more readable.

Shorter gensyms are more readable, but long names are much less likely to cause accidents than short ones. There never has been a mechanism to stop the user from accidentally colliding with "compiler" gensyms, like _hy_anon_var_1.

Maybe it's enough to tell the user not to use certain prefixes. But we could make this much more robust. For example, the HySymbol constructor could use unmangle to always return an unmangled symbol if passed a string containing the r"_*hyx_.*" prefix. This way it would raise an error if the user attempted to construct a gensym, since (unmangle "_hyx_X1235X_") is an error.

@vodik
Copy link
Contributor

vodik commented Apr 6, 2018 via email

@Kodiologist
Copy link
Member

So it's now pointless to make gensyms lexically invalid, which means we're free to make them more readable.

Right.

Maybe it's enough to tell the user not to use certain prefixes. But we could make this much more robust. For example, the HySymbol constructor could use unmangle to always return an unmangled symbol if passed a string containing the r"_*hyx_.*" prefix. This way it would raise an error if the user attempted to construct a gensym, since (unmangle "_hyx_X1235X_") is an error.

I don't want to prevent users from constructing a gensym on purpose, just by accident. They may want to do something sneaky with some Python code that was previously Hy-generated, or something.

I didn't think about the double prefix that would result from something like (gensym "|"). In practice, I don't think it will be much of a concern because you would typically write just (gensym), providing an argument only if you're interested in debugging the generated Python code or something, and if you're going to do that, it stands to reason you should use an argument that doesn't need mangling.

If you really want to avoid a double prefix, we could forget about the _hy_ and instead try something like beginning gensyms with an underscore followed by a Unicode private-use character. This means that every gensym's mangled name will start with _hyx_ and collisions are very unlikely. This would look like (in mangled form)

  • (gensym)_hyx_XUefafX_1235
  • (gensym "foo")_hyx_XUefafX_foo_1235
  • (gensym "+")_hyx_XUefafX_Xplus_signX_1235

It does beg the question though, should mangle always be reversible?

It's never going to round-trip properly because e.g. a-b and a_b both mangle to a_b. But unmangle shouldn't crash if you pass in a mangled name, no. It should only crash if given a string that looks mangled but isn't actually a valid mangled name, or if given the empty string.

@gilch
Copy link
Member Author

gilch commented Apr 7, 2018

I don't want to prevent users from constructing a gensym on purpose, just by accident.

Obviously the gensym function itself would have to do it somehow. Users wishing to construct a gensym on purpose could do it the same way. It could be a simple extra flag to HySymbol, e.g. (HySymbol _hyx_X1236X_foo :gensym 1)

only if you're interested in debugging the generated Python code or something

Often true, which is why I wanted the mangling to be human-readable in the first place, but we often debug macros using macroexpand works on hytrees, which is pre-mangle, right?

try something like beginning gensyms with an underscore followed by a Unicode private-use character

This actually seems like a pretty good solution.

should mangle always be reversible?

I think that would be nice. We could do it by swapping - and _, unless it's part of a leading or trailing train of underscores/hyphens. e.g. a-b would mangle to a_b like we want, but a_b would mangle to hyx_aXlow_lineXb.

The leading/trailing exception means we have to do private and dunder names with underscores like in Python as I recommended in #1062. So __a-b__ would mangle to __a_b__. You'd always have to write __init__ in Hy, not --init-- or _-init-_ or whatever like we can now. Seems cleaner. (--init-- would have to mangle to hyx_XhyphenHminusXXhyphenHminusXinitXhyphenHminusXXhyphenHminusX to be reversible, which is nothing like what Python would recognize as a dunder name.)

@gilch
Copy link
Member Author

gilch commented Apr 7, 2018

One more point about reversibility, a symbol entered with the hyx prefix would have to be mangled again for full reversibility, e.g. (mangle 'hyx_XsolidusX) would result in hyx_hyxXlow_lineXXlatin_capital_letter_xXsolidusXlatin_capital_letter_xX. I don't think this would come up often, but it would be another good reason to add the empty character name as an alias for X, which would shorten it to hyx_hyxXlow_lineXXXsolidusXX.

@vodik
Copy link
Contributor

vodik commented Apr 7, 2018

Sorry, to be more clear, with reversible, I really was aiming at stable. Don't know the appropriate math term - its almost idempotence:

=> (= (-> "foo-bar_baz" (mangle) (unmangle) (mangle))
...   (-> "foo-bar_baz" (mangle)))

As I was reading some of the ideas tossed around as toying with the idea of somewhat marrying gensym and mangle somewhat, but going back over the thread, I don't think anyone really suggested that.

@Kodiologist
Copy link
Member

we often debug macros using macroexpand works on hytrees, which is pre-mangle, right?

That's right.

Sorry, to be more clear, with reversible, I really was aiming at stable. Don't know the appropriate math term - its almost idempotence

mangle is indeed idempotent, or ought to be, but the property you wrote down should be true as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants