Skip to content

Supporting a new language (classic)

Andy Massimino edited this page Jan 15, 2022 · 8 revisions

In order for match-up to support a new language, you must define a suitable pattern for b:match_words.
If your language has a complicated syntax, or many keywords, you will need to know something about vim's regular-expressions.

The format for b:match_words is similar to that of the 'matchpairs' option: it is a comma (,)-separated list of groups; each group is a colon(:)-separated list of patterns (regular expressions). Commas and backslashes that are part of a pattern should be escaped with backslashes (':' and ','). It is OK to have only one group; the effect is undefined if a group has only one pattern. A simple example is

:let b:match_words = '\<if\>:\<endif\>,'
	\ . '\<while\>:\<continue\>:\<break\>:\<endwhile\>'

(In vim regular expressions, \< and \> denote word boundaries. Thus "if" matches the end of "endif" but "<if>" does not.) Then banging on the "%" key will bounce the cursor between "if" and the matching "endif"; and from "while" to any matching "continue" or "break", then to the matching "endwhile" and back to the "while". It is almost always easier to use literal-strings (single quotes) as above: '<if>' rather than "\<if\>" and so on.

Exception: If the ":" character does not appear in b:match_words, then it is treated as an expression to be evaluated. For example,

:let b:match_words = 'GetMatchWords()'

allows you to define a function. This can return a different string depending on the current syntax, for example. Note: this is deprecated in match-up, try not to use it if possible.

Once you have defined the appropriate value of b:match_words, you will probably want to have this set automatically each time you edit the appropriate file type. The recommended way to do this is by adding the definition to a filetype-plugin file.

Tips: Be careful that your initial pattern does not match your final pattern. See the example above for the use of word-boundary expressions. It is usually better to use ".{-}" (as many as necessary) instead of "." (as many as possible). See \{-. For example, in the string "label", "<.>" matches the whole string whereas "<.{-}>" and "<[^>]*>" match "" and "".

Spaces

If "if" is to be paired with "end if" (Note the space!) then word boundaries are not enough. Instead, define a regular expression s:notend that will match anything but "end" and use it as follows:

:let s:notend = '\%(\<end\s\+\)\@<!'
:let b:match_words = s:notend . '\<if\>:\<end\s\+if\>'

This is a simplified version of what is done for Ada. The s:notend is a script-variable. Similarly, you may want to define a start-of-line regular expression

:let s:sol = '\%(^\`;\)\s*'

if keywords are only recognized after the start of a line or after a semicolon (;), with optional white space.

Backrefs

In any group, the expressions \1, \2, ..., \9 refer to parts of the INITIAL pattern enclosed in \(escaped parentheses\). These are referred to as back references, or backrefs. For example,

:let b:match_words = '\<b\(o\+\)\>:\(h\)\1\>'

means that "bo" pairs with "ho" and "boo" pairs with "hoo" and so on. Note that "\1" does not refer to the "(h)" in this example. If you have "(nested (parentheses)) then "\d" refers to the d-th "(" and everything up to and including the matching ")": in "(nested(parentheses))", "\1" refers to everything and "\2" refers to "(parentheses)". If you use a variable such as s:notend or s:sol in the previous paragraph then remember to count any "(" patterns in this variable. You do not have to count groups defined by \%(\).

It should be possible to resolve back references from any pattern in the group. For example,

:let b:match_words = '\(foo\)\(bar\):more\1:and\2:end\1\2'

would not work because "\2" cannot be determined from "morefoo" and "\1" cannot be determined from "andbar". On the other hand,

:let b:match_words = '\(\(foo\)\(bar\)\):\3\2:end\1'

should work (and have the same effect as "foobar:barfoo:endfoobar"), although this has not been thoroughly tested.

You can use zero-width patterns such as \@<= and \zs.
For example, if the keyword "if" must occur at the start of the line, with optional white space, you might use the pattern "(^\s*)@<=if" so that the cursor will end on the "i" instead of at the start of the line. For another example, if HTML had only one tag then one could

:let b:match_words = '<:>,<\@<=tag>:<\@<=/tag>'

so that "%" can bounce between matching "<" and ">" pairs or (starting on "tag" or "/tag") between matching tags. Without the \@<=, the script would bounce from "tag" to the "<" in "", and another "%" would not take you back to where you started.

Reference

Adapted from matchit.txt.

Clone this wiki locally