Skip to content

Python ast generators

Almar Klein edited this page Jul 7, 2016 · 4 revisions

A list of possibilities to do AST parsing in Python

Dabeaz' ply.lex and ply.yacc

Seem pretty solid tools, but will have to write grammar for tokenizer and parser in specific format.

Plyplus

https://github.com/erezsh/plyplus

Uses Ply, but provides a different way to write grammar. Has a Python grammar, but seems not that solid (there are some issues about it).

PyJS does this

They make use of lib2to3 and than the code gets fuzzy for me.

Pygments

Should have a good tokenizer, but would still need a parser.

tokenize.py

Is part of Python std lib, and pure Python. And only a tokenizer. I think this is also used in lib2to3.

Baron

https://github.com/Psycojoker/baron

Implements a Full Syntax Tree (FST) that allows writing back the code in exactly the same way. Currently no proper 3.x support though. Has an issue to maybe base on lib2to3.

Jedi

https://github.com/davidhalter/jedi

Uses modified version of tokenize and pgen2 (part of lib2to3). Seems to have good py3k support, but is aimed at supporting also broken code, and I read somewhere that expressions are only partly represented.

lib2to3

https://github.com/python/cpython/tree/master/Lib/lib2to3

Is part of the Python std lib. Implements a pure Python ast parser. Documentation is sparse though.

Some projects that make use of it: PyJS, https://github.com/google/yapf, ...

python4ply

Project has not been maintained for years, but it translates the official Python grammar file to Ply's grammar. Would have to 'revive' this project though.

Emscripten on Python's ast.c

But then you'd want to separate it from the cPython lib. Should be possible, but may not be the easiest route.

Clone this wiki locally