-
Notifications
You must be signed in to change notification settings - Fork 743
Adding a new lexer
This is a newbie's guide to a new lexer in Rouge as I did not find one while working on my Turtle lexer. Note that I have never before seen Ruby and related tools, so some pieces of the guide may be obvious. This works on Linux and on Windows using Cygwin. Thanks @mjclemente for his useful blogposts on setting up Rouge and creating a lexer, this guide is based on them.
- Fork Rouge
- Clone your fork, i.e. not
$ git clone https://github.com/jneen/rouge.git
, but your repo - Follow Setting up Ruby or @mjclemente blogpost to setup the environment.
- Note that if you cannot find
rackup
, add it to PATH like this:export PATH=$PATH:~/.gem/ruby/gems/rack-1.6.4/bin/
e.g. in.bashrc
in your home directory.
- You can run the
rougify
script on a file - You can run
rackup
and see all available lexers and their demos onhttp://localhost:9292
and a specific one (e.g. XML) onhttp://localhost:9292/xml
- Think of a name for your new lexer, I was doing a lexer for Turtle, so I chose
turtle
.
Yes, we are going to copy & paste an existing lexer and iteratively make it our own. I use turtle
, you will use your lexer name. I start with the xml
lexer, you should start with a lexer which is somehow closest to your language. However, if you are doing a lexer that is very close to another existing one, consider extending that one instead of creating a new one.
- Copy
/spec/lexers/lexername_spec.rb
to/spec/lexers/turtle_spec.rb
. This is basically just an outside description (like an interface) of the lexer. ChangeRouge::Lexers::XML
toRouge::Lexers::Turtle
on line 3 andRouge::Lexers::XML.new
toRouge::Lexers::Turtle.new
on line 4. Rouge guesses the input file format based on filename extension, MIME-type and content, so adjust the three blocks by adding/removing lines and adjusting to your format's extensions and MIME-types. - Copy
/spec/visual/samples/xml
to/spec/visual/samples/turtle
. This is a longer input file that gets lexed onhttp://localhost:9292/xml
. Change it to be a longer file in your language, using as many of the language constructs as possible, ideally all. - Copy
/lib/rouge/demos/lexername
to/lib/rouge/demos/turtle
. This is the short language demo shown in the list onhttp://localhost:9292
. Again, provide a short input in your language, showing as much of the language as possible. - Copy
/lib/rouge/lexers/xml.rb
to/lib/rouge/lexers/turtle.rb
. This is the code of the lexer itself. Changeclass XML < RegexLexer
toclass Turtle < RegexLexer
on line 5, change the title, description, filenames (extensions) and MIME-types to match those from the spec file. Finally, adjust thedef self.analyze_text(text)
method, which takes e.g. first 1000 characters from the input file and matches it using a regex. In case of a match returns a match probability number.
If you are new to Ruby and its regexes, read the specification, especially if in doubt about %r
, /i
, /b
, etc.
Now, when you access and http://localhost:9292
, you should see your language, turtle
in my case, listed with a demo and on http://localhost:9292/turtle
you should see the longer sample. Of course, the highlight is still the untouched source, which probably means lots of errors (red highlights) in highlighting your file.
Also, you should be able to run the test without errors.
The work on the lexer usually goes like this:
- With
rackup
running, in one browser window you havehttp://localhost:9292
to see the demo file andhttp://localhost:9292/turtle
to see the sample file. - In your favorite text editor, ideally with Ruby syntax highlight, you have the lexer
/lib/rouge/lexers/turtle.rb
, which contains a set of rules. - In another window you have the list of tokens produced by the rules which annotate the text.
- You change the rules in the lexer (few tips in the next section), save, refresh the browser and do this until done.
After you are done with your lexer, commit and push it to your forked repository (it should be the 4 files) and create a pull request and after a while, check whether it passes tests.
OK, with the Turtle lexer, I did a simple thing with no custom states (all rules in :root). If you need something more complex, it is up to you. This is a list of documents I used:
And a few tips
- Order of the rules matters!
- Start with something simple :)