Rust reimplementation of the sed utility with some GNU sed, FreeBSD sed, and other extensions.
At this state sed implements all POSIX commands and can run correctly the two complex scripts: hanoi.sed (solves the Towers of Hanoi puzzle) and math.sed (implements an arbitrary precision integer math calculator).
The performance of this Rust implementation is now better than the GNU and FreeBSD implementations for most benchmarked cases.
Further work aims to improve runtime error reporting by including script coordinates in each command, adjust buffering on terminal output to match current implementations, implement more GNU extensions, and improve performance where possible.
Ensure you have Rust installed on your system. You can install Rust through rustup.
Clone the repository and build the project using Cargo:
git clone https://github.com/uutils/sed.git
cd sed
cargo build --release
cargo run --release
- Command-line arguments can be specified in long (
--
) form. - Spaces can precede a regular expression modifier.
I
can be used in as a synonym for thei
(case insensitive) substitution flag.- In addition to
\n
, other escape sequences (octal, hex, C) are supported in the strings of they
command. Under POSIX these yield undefined behavior. - The substitution command replacement group
\0
is a synonym for &. - A
Q
command (optionally followed by an exit code) quits immediately. - The
q
command can be optionally followed by an exit code. - The
l
command can be optionally followed by the output width. - The
--follow-symlinks
flag for in-place editing.
- The second address in a range can be specified as a relative address with +N.
- In-place editing of file with the
-i
flag.
- Unicode characters can be specified in regular expression pattern, replacement
and transliteration sequences using
\uXXXX
or\UXXXXXXXX
sequences. - The
l
command lists Unicode characters using the\uXXXX
and\UXXXXXXXX
sequences.
- The input is assumed to be valid UTF-8 (this includes 7-bit ASCII). If the input is in another code page, consider converting it through UTF-8 in order to avoid errors on invalid UTF-8 sequences and for the correct handling of regular expressions. This sed program can also handle arbitrary byte sequences if no part of the input is treated as string.
- The command will report an error and fail if duplicate labels are found in the script. This matches the BSD behavior. The GNU version accepts duplicate labels.
- The last line (
$
) address is interpreted as the last non-empty line of the last file. If files specified in subsequent arguments until the last one are empty, then the last line condition will never be triggered. This behavior is consistent with the original implementation. - Labels are parsed for alphanumeric characters. The BSD version parses them until the end of the line, preventing ; to be used as a separator.
sed is licensed under the MIT License - see the LICENSE
file for details