You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Character class prefixes: `gc:` (general category), `sc:` (script), `scx:` (script extension), `blk:` (blocks).
17
-
For example, `[scx:Syriac]` matches all characters with the Syriac script extension.
16
+
-[Intersection](https://www.regular-expressions.info/charclassintersect.html) of character sets added, using the new `&` operator:
18
17
19
-
- Adds support for script extensions (currently supported in PCRE, JavaScript, and Rust)
20
-
- If the `blk:` prefix is used, `In` must be removed; e.g. `[InPrivate_Use]` becomes `[blk:Private_Use]`
21
-
- Writing the prefix is optional, except for script extensions
18
+
```pomsky
19
+
[Thai] & [Nd] # equivalent to the regex [\p{Thai}&&\p{Nd}]
20
+
```
22
21
23
-
- A `pomsky test` subcommand for running unit tests
22
+
Note that subtraction can be achieved by negating the character set to be subtracted:
24
23
25
-
- Two supported regex engines for testing: `pcre2` and `rust`
26
-
- The `--test` argument is now deprecated
24
+
```pomsky
25
+
[Thai] & ![Nd] # equivalent to the regex [\p{Thai}--\p{Nd}]
26
+
```
27
27
28
-
-Many optimizations (see below)
28
+
-Match [Script Extensions](https://www.unicode.org/L2/L2011/11406-script-ext.html), using the `scx:` or `script_extensions:` prefix:
29
29
30
-
### Changed
30
+
```pomsky
31
+
[scx:Syriac]
32
+
```
31
33
32
-
- Change hygiene of `lazy` and `unicode` mode to behave as one would expect.
33
-
Going forward, modes depend on the scope where an expression is defined, not where it is used:
34
+
Other Unicode properties also get optional prefixes:
34
35
35
36
```pomsky
36
-
let foo = 'foo'*; # this repetition is not lazy
37
-
(enable lazy; foo)
37
+
# old # new # alternative
38
+
[Latin] [sc:Latin] [script:Latin]
39
+
[InGreek] [blk:Greek] [block:Greek]
40
+
[Letter] [gc:Letter] [general_category:Letter]
38
41
```
39
42
40
-
- Increase the maximum length of group names from 32 to 128 characters.
41
-
Group names this long are supported in PCRE2 since version 10.44.
43
+
Note that for the `In` prefix of Unicode blocks is omitted when the `blk:` or `block:` prefix is used. Unicode blocks with `In` instead of `blk:` will be deprecated.
42
44
43
-
-Produce an error if the contents of a lookbehind assertion are not supported by the regex flavor (Java, Python, PCRE)
45
+
-`pomsky test` subcommand added to compile and test all `*.pomsky` files in a directory. This command ignores files matched by a `.ignore` or `.gitignore` file. For help, run `pomsky test --help`.
44
46
45
-
-Produce an error if infinite recursion is detected
47
+
-Unit tests can now be run with the Rust `regex` crate. To use it, specify `--flavor=rust` or `--engine=rust`.
46
48
47
-
- Remove the compatibility warning for lookbehind in JavaScript.
48
-
Lookbehind is now widely supported in JavaScript engines.
49
+
- Diagnostic to detect infinite recursion. If a recursive expression can never terminate, an error is shown.
49
50
50
-
- Allow all supported boolean Unicode properties in the Java flavor
51
+
### Changes
51
52
52
-
-Deprecate the `--test` argument; use `pomsky test -p <PATH>` instead
53
+
-`lazy` and `unicode` mode is no longer inherited when expanding variables.
53
54
54
-
### Optimizations
55
+
> [!IMPORTANT]
56
+
> This changes the meaning of expressions such as this:
57
+
>
58
+
> ```pomsky
59
+
> let variable = 'test'*;
60
+
> enable lazy;
61
+
> variable
62
+
> ```
63
+
>
64
+
> Before Pomsky 0.12, the repetition was lazy, but now it isn't.
55
65
56
-
- De-duplicate and merge character ranges: `['b' 'a'-'f' 'c'-'m']` becomes `[a-m]`
66
+
The `enable` or `disable` statement has to appear before the repetition _syntactically_, it doesn't matter where the variable is used. The old behavior was too unintuitive and easy to mess up, so we fixed it.
57
67
58
-
- Note that this doesn't work with Unicode classes, e.g. `Alphabetic`
68
+
- Optimize single-character alternatives, and merge adjacent or overlapping ranges.
For example, `'a' | ['bc'] | ['f'-'i']` is optimized to `[a-cf-i]`.
61
71
62
-
- This only works with string literals and character sets, for now
63
-
- Only adjacent alternatives can be merged to ensure that precedence isn't affected
72
+
> [!NOTE]
73
+
> The order of character ranges in a set is no longer preserved. Currently, they are sorted in ascending order; for example, `['x' 'X' 'A'-'F' 'a'-'f']` becomes `[A-FXa-fx]`.
64
74
65
-
-Combine single-character alternations into a set: `'a' | 'b' | 'c' | 'f'` becomes `[a-cf]`
75
+
- No longer warn about lookbehind in JavaScript. Lookbehind is now widely supported.
0 commit comments