From 2db332b3f3e75cd8090a4d1a13b804004f0cc2ee Mon Sep 17 00:00:00 2001 From: Fabian Boehm Date: Mon, 23 Dec 2024 16:53:26 +0100 Subject: [PATCH 01/15] Add riir post --- site/_posts/2024-12-25-rustport.md | 316 +++++++++++++++++++++++++++++ 1 file changed, 316 insertions(+) create mode 100644 site/_posts/2024-12-25-rustport.md diff --git a/site/_posts/2024-12-25-rustport.md b/site/_posts/2024-12-25-rustport.md new file mode 100644 index 000000000..07aa888dd --- /dev/null +++ b/site/_posts/2024-12-25-rustport.md @@ -0,0 +1,316 @@ +--- +layout: post +title: "Fish 4.0: The Fish Of Theseus" +date: 2024-12-25 +categories: technical +--- + +About two years ago, our head maintainer @ridiculousfish opened what quickly became our most-read pull request: + +- [#9512 - Rewrite it in Rust](https://github.com/fish-shell/fish-shell/pull/9512) + +Truth be told, we did not quite expect that to be as popular as it was. +It was written as a bit of an in-joke for the fish developers first, and not really as a press release to be shared far and wide. +We didn't post it anywhere, but it was commented far and wide. + +Observant readers will note that the PR was a proposal to rewrite the entirety of fish in Rust, from C++. + +The fish-shell is no stranger to language changes - it was ported from pure C to C++ earlier in its life, +but this was a much bigger project, porting to a much more different language that didn't even exist when fish was started in 2007. + +Now that we've released the beta, containing 0% C++ and almost 100% pure Rust, let's look back to see what we've learned, what went well, what could have gone better and what we can do now. + +We're writing this so others can learn from our experience. Even if you have never written any Rust, you should be able to follow along. +Experience with a roughly C++-shaped language should help. + +## Why are we doing this again? + +We've experienced some pain with C++. In short: + +- tools +- slow-moving, but slower still until new versions are usable +- ergonomics +- compiler and platform differences +- (thread) safety +- dependency handling + +Frankly, the tooling around the language isn't good, and we had to take on some additional pain in order to support our users. +We want them to have an easy way to get the newest version, and we want to have contributions even by people who aren't on bleeding edge systems[^Contributions]. +That means we want it to be easy to build fish from source, and we want to build our own packages to have something to tell people who are on, +say, Ubuntu LTS and noticed that they are missing a cool new feature. We also prefer if we don't get reports of bugs +that we already fixed two versions ago[^LTS]. + +Doing that meant we could never rely on the newest C++ features. We started using C++11 in 2016, +and yet we *still* needed to upgrade the compilers on our build machines until 2020. +And upgrading the C++ compiler is annoying. + +Fish also uses threads for its award-winning (*note to editor*: find an actual award) autosuggestions and syntax highlighting, +and one long-term project is to add concurrency to the language. + +Here's a dirty secret: fish's script execution is currently entirely serial - you can't background functions, +and you can't even run builtins in parallel. This code: + +```fish +for i in 1 2 3 4 5 + sleep 1 +end | while read -l num + break +end +``` + +takes 5 seconds, because the `while read` loop can't even run before the `for` loop completes. + +POSIX shells use subshells to get around this, but subshells are a leaky abstraction that can bite you in the behind when you least expect it. +For instance you can't set variables from inside a pipe like that (except on some shells, but only in the last part of the pipe, maybe, if you have enabled the correct option). +We would like to avoid that, and so the heavy hand of forking off a process isn't appealing. + +This involves a lot of careful handling of shared state, and C++ famously does not help - thread safety is your responsibility as the programmer. + +The ergonomics of C++ are also simply not good - header files are annoying, templates are complicated, you can easily cause a compile error that throws *pages* of overloads in the standard library at you. Many functions are unsafe to use. C++ string handling is very verbose with +easily confusable overloads of many methods, making it attractive to drop down to C-style char pointers, which are quite unsafe. + +And the standard prioritizes performance over ergonomics. Consider for instance string_view, which provides a non-owning slice of a string. This is an extremely modern, well-liked feature that C++ programmers often claim is a great reason to switch to C++17. And it is extremely easy to run into use-after-free bugs with it, because the ergonomics weren't a priority. + +And even *if* the standard decided to add features that we would like, it would take three years for a new standard to release plus some time for compilers to gain support *plus* time for these compiler versions to become ubiquitous so we can require them. Again: Upgrading the C++ compiler is a pain. + +One good case study of the deficiencies of C++-in-practice is a C library: curses. This is a venerable library to access terminal features, and we exclusively use it to gain access to the terminfo database, which describes differences in terminal features and behavior. + +This not only caused us grief by being unsafe to use in weird ways - the "cur_term" pointer (or sometimes macro!) can be NULL, and it is dereferenced in surprising places, but also caused a surprisingly high number of issues when building from source. This was either because there are multiple implementations of it with differences as useless as "this function takes a char on system X but an int on system Y", but also because users kept coming to us with new and exciting(ly terrible) ways to package and install it. The dependency system is the system package manager. + +Finally, subjectively, C++ isn't drawing in the crowds. We have never had a lot of C++ contributors. Over the 11 years fish used C++, only 17 people have at least 10 commits to the C++ code. We also don't know a lot of people who would love to work on a C++ codebase in their free time. + +Some parting thoughts we can give the C++ community: We would like to see improvements to ergonomics and safety of the language and the tools prioritized over performance, and we would like to see efforts to make C++ compilers easier to upgrade on real systems. + +## Why Rust? + +We need to get one thing out of the way: Rust is cool. It's fun. + +It's tempting to try to sweep this under the rug because it feels gauche to say, but it's actually important for a number of reasons. + +For one, fish is a hobby project, and that means we want it to be fun for us. Nobody is being paid to work on fish, so we *need* it to be fun. +Being fun and interesting also attracts contributors. + +Rust also has great tooling. The tools have really paid a lot of attention to use, and the compiler errors are terrific. Not even "compared to C++", they just actually rule. And as we have tried to pay attention to our own error messages (fish has a bespoke error for if it thinks a file you told it to run has Windows line endings), +we like it. + +And it is *easy* to get that tooling installed - `rustup` is magic, and allows people to get started quickly, with minimal fuss or root permissions. +When the answer to "how to upgrade C++ compiler" is "find a repository (with root permissions), compile it yourself, install some *other* repository or a docker image", +it is amazing how the Rust answer can just be "use rustup". + +Rust has great ergonomics - the difference between C++'s pointers (which can always be NULL) and Rust's Options are apparent very quickly even to those of us who had never used it before. We did have a backport of C++'s optional, and liked using it, but it was never as integrated as Rust's Options were. + +Having an explicit `use` system where you know exactly which function comes from which module is a great improvement over `#include`. + +Rust makes it nice to add dependencies. We don't want to go overboard with it, but we do want to change our history format from our homegrown "I can't believe it's not YAML" to something specified that other tools can actually read, and Rust makes it easy to add support for YAML/JSON/KDL. + +And yes, Rust promises to help us with our threading problem. + +We did not do a comprehensive survey of other languages. We were confident Rust was up to the task and either already knew it or wanted to learn it, so we picked it. + +## Platform Support + +A lot of hay has also been made online about Rust's platform support (e.g. [in the git project](https://lwn.net/Articles/998115/)). We don't see a big problem here - all of our big platforms (macOS, Linux, the BSDs) are supported, as are Opensolaris/Illumos and Haiku. We have never heard of anyone trying to run fish on NonStop. + +Architecture support is even less of a problem - going by [debian's popcon](https://popcon.debian.org/), 99.9995% (the actual result, not an exaggeration) of machines run an architecture that has Rust packages in Debian. Given that fish is [installed on 1.92% of Debian systems](https://qa.debian.org/popcon.php?package=fish), we would project two (2) or three (3) machines of the quarter million responses to have fish on an unsupported architecture [^stats]. + +Unlike what some online have assumed, a native Windows port was not a reason for switching to Rust as it was never in the cards. Fish is, at heart, a unix shell that relies not only on unix APIs but also their semantics, and exposes them in the scripting language. What would `test -x` say on Windows, which has no executable bit? These are issues that *could* be solved with a lot of work, but we're unix nerds making a unix shell, not one for Windows. + +The one platform we care about a bit that it does not currently seem to have enough support for is Cygwin, which is sad but we have to make a cut somewhere. + +## The story of the port + +We had decided we were gonna do a "Fish Of Theseus" port - we would move over, component by component, until no C++ was left. +And at every stage of that process, it would remain a working fish. + +This was a necessity - if we didn't, we would not have a working program for months, which is not only demoralizing but would also have precluded us from +using most of our test suite - which is end-to-end tests that run a script or fake a terminal interaction. We would also not have been able to do another C++ release, +putting some cool improvements into the hands of our users. + +Had we chosen to disappear into a hole we might not have finished at all, and we would have to re-do a bunch of work once it became testable. +We also mostly kept the structure of the C++ code intact - if a function is in the "env" subsystem, it would stay there. Resisting the temptation to +clean up allowed us to compare the before and after to find places where we had mistranslated something. + +So we used [autocxx](https://google.github.io/autocxx/) to generate bindings between C++ and Rust code, allowing us to port one component at a time. + +We started by porting the builtins. These are essentially little self-contained programs, with their own arguments, streams, exit code, etc. +That means it's easy to port them separately from the rest of the shell once you have a way to call a Rust builtin from C++, which we had as part of the initial pull request. + +Where they connected to the main shell, we used one of three approaches: + +1. Add some FFI glue to the C++ to make it callable from Rust, port the caller and leave the callee for later +2. Move the callee to Rust and, if necessary, make it callable from C++ +3. Write a Rust version of the callee and call it from the ported caller, but leave the C++ version around + +For instance, almost every builtin needs to parse its options. We have our own implementation of getopt, that we reimplemented in Rust in the initial PR, +but the C++ version stuck around until it had no more callers remaining. Otherwise we would have had to write a C++-to-Rust bridge and adjust the C++ callers to use it. + +Or the `builtin` builtin needs access to the names of all builtins to print them for `builtin --get-names`. In that case we bridged some access to what amounts to a constant vector of strings in the C++, and eventually moved it over once the users were in Rust. + +That's how it went for a while, but we finally hit the more entangled systems, where porting larger chunks felt more productive, +since that reduced the amount of tricky FFI code to be written only to be thrown away. These were ported in solo efforts. +This includes the input/output "reader", which is, unsurprisingly, one of fish's biggest parts, ending up at about 13000 lines of Rust. + +During the port, we hit a bunch of snags with (auto)cxx. Sometimes it would just not understand a particular C++ construct, and we spent a lot of time trying to figure out ways to please it. As an example, we introduced a struct on the C++ side that wrapped C++'s `vector`, because for some reason autocxx liked to complain about `vector`. It lacks support for wstring/wchar, which is understandable because using wchar is a horrible decision - we only do it because it's a historical mistake. + +Similarly, we had to wrap some C++ variables in `unique_ptr` and similar to make the ownership rules understandable to (auto)cxx. + +We also patched autocxx to remove the requirement to use `unsafe` to invoke any C++ API, because that would have obscured uses of `unsafe` that wouldn't disappear just by porting the callee. We were building something temporary, so sometimes it is okay to do something a little underhanded. +If you used this for a permanent bridge between Rust and C++ in a few parts of your code, the `unsafe` markers might be useful, but in our case they were noise. + +Because autocxx generated a lot of code, some tools also were less helpful than they'd usually be. rust-analyzer for instance was extremely slow. + +So, even though our codebase was fairly amenable to being moved to Rust because we didn't use exceptions or a lot of templates, autocxx isn't the easiest to work with. +It is absolutely magical that it works at all, and it enabled us to do this port, but it has a hard task to perform and isn't perfect at it. + +### The Timeline + +- The initial PR was opened on 28th January 2023, merged on 19th February 2023 + +- fish 3.7.0, another release in the C++ branch to flush out some accumulated improvements, was released in January 2024 + +- The last C++ code was removed in January 2024 (and some additional test code was ported from C++ to C 12th of June 2024) + +- The first beta was released 17th of December 2024 + +The initial PR had a timeline of "handwaving, half a year". It was clear to all of us that it might very well be entirely off, and we're not +disappointed that it was. Frankly, 14 months was still a pretty good pace, especially considering that we made a C++ release in-between, so it did not throw off our usual release cadence. + +Most of the work was done by 7 people (going by those with at least 10 commits to ".rs" files), but we got a lot of help from interested community members. + +The delay after that was down to a few reasons: + +1. The "second 90%" - testing that everything worked. We flushed out a lot of bugs in this time, and if we made a release at that time it would have been a bad one. +2. Having something to release that's visible to users - there's no point in making a release that does the same thing in new code, you need it to do different things. + So we held off until we had something. +3. Simple availability - sometimes, some of us took time off. + +So if you are trying to draw any conclusions from this, consider the context: A group of people working on a thing in their free time, +diverting some effort to work on something else, *and* deciding that after the work is finished it actually isn't. + +## The Gripes + +It won't surprise anyone who has spent any time on this world of ours that Rust is not, in fact, perfect. We have some gripes with it. + +Chief among them is how Rust handles portability. While it offers many abstractions over systems, allowing you to target a variety of systems with the same code, +when it comes to *adapting* your code to systems at a lower-level, it's all based on enumerating systems by hand, using checks like `#[cfg(any(target_os = "freebsd", target_os = "netbsd", target_os = "openbsd"))]`. + +This is an imperfect solution, allowing you to miss systems and ignoring version differences entirely. From what we can tell, if Freebsd 12 gains a function that we want to use, libc would add it, but calling it would then fail on FreeBSD 11 without a good way to check, at the moment. + +But listing targets in our code is also fundamentally duplicating work that the libc crate (in our case) has already done. If you want to call libc::X, which is only defined on systems A, B and C, you need to put in that check for A, B and C yourself and if libc adds system D you need to add it as well. Instead of doing that, we are using our own [rsconf](https://github.com/mqudsi/rsconf) crate to do compile-time feature detection in build.rs. + +Most of this would be solved if Rust had some form of saying "compile this if that function exists" - `#[cfg(has_fn = "fstatat")]`. With that, the libc crate could do whatever checks it wants and fish would just follow what it did, and we could remove a lot of the use for rsconf. It would not really help support older distributions that lack some features, tho. That could be solved by something like the [min_target_API_version](https://github.com/rust-lang/rfcs/pull/3036) cfg. + +While we're on portability, the tools also sometimes fail to consider other targets - clippy may warn about a conversion being useless when it isn't on another system, it is often better to use `if cfg!(...)` instead of `#[cfg(...)]` because code behind the latter is eliminated very early, so it may be entirely wrong and only shows up when building on the affected system. + +We've also had issues with localization - a lot of the usual Rust relies on format strings that are checked at compile-time, but unfortunately they aren't translatable. +We ported printf from musl, which we required for our own `printf` builtin anyway, which allows us to reuse our preexisting format strings at runtime. + +### The Mistakes + +We've hit some false starts, dead ends and other kinds of mistakes, for instance we originally originally used a fancy macro to allow us to write our strings as `"foo"L`, but that did not end up carrying its weight and we removed it in favor of a regular `L!("foo")` macro call. + +We we were confused by a deprecation warning in the libc crate, which explains that "time_t" will be switched to 64-bit on musl in the future. +We initially tried to work around it, adding a lot of wrappers to try to stay agnostic on that size, but only later figured out that it does not affect us, +as we do not pass a time_t we get from one C library to another. (https://github.com/fish-shell/fish-shell/issues/10634) + +Some bugs appeared because we missed subtleties of the original code. +Often this turned into a crash because we used asserts or assert's modern cousin ".unwrap()". This was often the easiest way to translate the C++, +and sometimes it simply turned out to be not accurate, and had to be replaced with different error handling. + +But overall most of these were, once found, pretty shallow - "it panics here, why would it do that? oh, this can be an Err? Okay, what leads to that? Ah, okay, let's handle that in this way". + +We've also caused some friction by turning on link-time-optimization combined with having release builds as the default in CMake (currently needed to run the full test suite), +which makes it easy to accidentally have very long build time. + +## The Good + +A lot of the benefits of porting to Rust will appear over time, but some are already here. + +Remember our issues with (n)curses? We will no longer have any, because we no longer use curses. Instead we switched to [a Rust crate]() that gives us just what we need, which is access to terminfo and expanding its sequences. This removes some awkward global state, and means those building from source no longer need to ensure that curses is installed "correctly" on their system - cargo just downloads a crate and builds it. + +We do still read terminfo, which means users need to install that, but that can be done at runtime, is preinstalled on all mainstream systems *and* if it can't be found we just use an included copy of the xterm-256color definitions[^terminfo]. + +We have also managed to create "self-installable" fish packages that include all the functions, completions and other asset files in the fish binary to be written out at runtime. +That allowed us to create statically linked versions of fish (for linux this uses musl, because glibc has unavoidable crashes!), so for the first time we have *one file* you can download and run on *any linux* (the only requirement being that the architecture matches!). + +This is a pretty big boon for people who want to use fish but sometimes ssh to servers, where they might not have root access to install a package. So they can just `scp` a single file and it's available. + +This might be possible with C23's `#embed`, but Rust allowed us to do it now and, overall, pretty easily. + +## The Sad + +The one goal of the port we did not succeed in was removing CMake. + +That's because, while `cargo` is great at *building* things, it is very simplistic at *installing* them. Cargo wants everything in a few neat binaries, +and that isn't our use case. Fish has about 1200 .fish scripts (961 completions, 217 associated functions), as well as about 130 pages of documentation (as html and man pages), +and the web-config tool and the man page generator (both written in python). + +It also has a test suite that is light on unit tests but heavy on end-to-end script and interactive tests. The scripted tests run through our own littlecheck tool, +which runs a script and compares its output to embedded comments. The interactive tests are driven by pexpect, which fakes terminal interaction and checks that the right thing happens when you press buttons. + +We kept cmake, in a simplified form, for these tasks, but let it hand over the responsibility of *building* to cargo. + +It would be possible to switch all that to a simpler task runner like Just or even plain old makefiles, but since we already have this system we're keeping it for now. +The upside is that the build process hasn't really changed for packagers. + +We're also losing Cygwin as a supported platform for the time being, because there is no Rust target for Cygwin and so no way to build binaries targeting it. +We hope that this situation changes in future, but we had also hoped it would improve during the almost two years of the port. +For now, the only way to run fish on Windows is to use WSL. + +## The Now + +We've succeeded. This was a gigantic project and *we made it*. The sheer scale of this is perhaps best expresed in numbers: + +- 1155 files changed, 110247 insertions(+), 88941 deletions(-) (excluding translations) +- 2604 commits by over 200 authors +- 498 issues +- Almost 2 years of work +- 57K Lines of C++ to 75K Lines of Rust [^formatting] (plus 400 lines of C [^ccode]) +- [C++--](https://github.com/fish-shell/fish-shell/pull/10564) + +The beta works very well. Performance is usually slightly better in terms of time taken, memory use has a slightly higher floor but a lower ceiling - it will use 8M instead of 7M at rest, but e.g. globbing a big directory won't make it go up as much. These things can all be improved, of course, but for a first result it is encouraging. + +Fish is still a bit of an odd duck...fish as a Rust program. It has some bits that smell like C spirit, directly using the C API and passing around file descriptors instead of File objects, it still uses UTF-32 strings - which is why we are using a fork of the pcre2 crate because we couldn't convince the pcre2-crate maintainer to add UTF-32 support. We hope to find a nicer solution here, but it wasn't necessary for the first release. + +The port wasn't without challenges, and it did not all go *entirely* as planned. But overall, it went pretty dang well. We're now left with a codebase that we like a lot more, that has already gained some features that would have been much more annoying to add with C++, +with more on the way, and we did it while creating a separate 3.7 release that also included some cool stuff. + +And we had fun doing it. + +-------------------------- + +[^Contributions]: We rely on contributions from as diverse a set of people as we can for our completion scripts. We can only really get a completion script for a tool from + someone who knows that tool. And ideally, they would also test their script with the newest source from git - + both to get more testing and to take advantage of new features we introduce. + So we want to make this as painless as possible. This is working rather well, overall - we have over 1000 completion scripts in our codebase. + +[^LTS]: The idea that LTS users should report bugs to their distribution is basically fiction. Not only does it not happen but it would also + be a terrible idea given that fish is in Ubuntu's "universe" repository, meaning it is imported automatically from Debian and otherwise almost entirely unmaintained in Ubuntu. + +[^stats]: That is assuming that there isn't a correlation between running fish and using an unusual processor architecture. Also this includes Hurd and kFreeBSD. + +[^terminfo]: We have discussed switching to not reading terminfo at all because in practice it is almost entirely useless. (we could write another 3000 words on the topic, but the short of it is that it is slow to update and integrate new features, often wrong, has no versioning mechanism and, most importantly, documents differences that barely exist anymore in the types of terminals that people actually use) + +[^formatting]: A lot of the increase in line count can be explained by rustfmt's formatting, as it likes to spread code out over multiple lines, like: + ```rust + if opts.show + && (opts.local + || opts.function + || opts.global + || opts.erase + || opts.list + || opts.exportv + || opts.universal) + ``` + + which was one line in our C++ version. + + The rest is additional features. + + Also note that our Rust code is in some places a straight translation of the C++, and fully idiomatic Rust might be shorter. + +[^ccode]: We use C in three places: + - To connect some functions or variables that aren't (yet) in the libc crate + - To do compile-time feature detection + - In our fish_test_helper binary, which mocks some unix behaviors for tests + (things like "print blocked signals" or "acquire the terminal") From c7eed18b01ebee2900b090d6c1b63a0e9892645d Mon Sep 17 00:00:00 2001 From: Fabian Boehm Date: Mon, 23 Dec 2024 21:36:32 +0100 Subject: [PATCH 02/15] link ship of theseus --- site/_posts/2024-12-25-rustport.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/site/_posts/2024-12-25-rustport.md b/site/_posts/2024-12-25-rustport.md index 07aa888dd..a86f8aa4f 100644 --- a/site/_posts/2024-12-25-rustport.md +++ b/site/_posts/2024-12-25-rustport.md @@ -119,7 +119,7 @@ The one platform we care about a bit that it does not currently seem to have eno ## The story of the port -We had decided we were gonna do a "Fish Of Theseus" port - we would move over, component by component, until no C++ was left. +We had decided we were gonna do a "Fish [Of Theseus](https://en.wikipedia.org/wiki/Ship_of_Theseus)" port - we would move over, component by component, until no C++ was left. And at every stage of that process, it would remain a working fish. This was a necessity - if we didn't, we would not have a working program for months, which is not only demoralizing but would also have precluded us from From 9ec87efed603185900ea0714adba0e68be3194b0 Mon Sep 17 00:00:00 2001 From: Fabian Boehm Date: Tue, 24 Dec 2024 15:35:29 +0100 Subject: [PATCH 03/15] tighten wording --- site/_posts/2024-12-25-rustport.md | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/site/_posts/2024-12-25-rustport.md b/site/_posts/2024-12-25-rustport.md index a86f8aa4f..38f031cfe 100644 --- a/site/_posts/2024-12-25-rustport.md +++ b/site/_posts/2024-12-25-rustport.md @@ -11,17 +11,18 @@ About two years ago, our head maintainer @ridiculousfish opened what quickly bec Truth be told, we did not quite expect that to be as popular as it was. It was written as a bit of an in-joke for the fish developers first, and not really as a press release to be shared far and wide. -We didn't post it anywhere, but it was commented far and wide. +We didn't post it anywhere, but other people did, and we got a lot of reactions. Observant readers will note that the PR was a proposal to rewrite the entirety of fish in Rust, from C++. The fish-shell is no stranger to language changes - it was ported from pure C to C++ earlier in its life, but this was a much bigger project, porting to a much more different language that didn't even exist when fish was started in 2007. -Now that we've released the beta, containing 0% C++ and almost 100% pure Rust, let's look back to see what we've learned, what went well, what could have gone better and what we can do now. +Now that we've released the beta of fish 4.0, containing 0% C++ and almost 100% pure Rust, let's look back to see what we've learned, what went well, what could have gone better and what we can do now. -We're writing this so others can learn from our experience. Even if you have never written any Rust, you should be able to follow along. -Experience with a roughly C++-shaped language should help. +We're writing this so others can learn from our experience, but it is *our* experience and not an exhaustive study. +We hope that you'll be able to follow along even if you have never written any rust, but +experience with a roughly C++-shaped language should help. ## Why are we doing this again? @@ -71,7 +72,7 @@ easily confusable overloads of many methods, making it attractive to drop down t And the standard prioritizes performance over ergonomics. Consider for instance string_view, which provides a non-owning slice of a string. This is an extremely modern, well-liked feature that C++ programmers often claim is a great reason to switch to C++17. And it is extremely easy to run into use-after-free bugs with it, because the ergonomics weren't a priority. -And even *if* the standard decided to add features that we would like, it would take three years for a new standard to release plus some time for compilers to gain support *plus* time for these compiler versions to become ubiquitous so we can require them. Again: Upgrading the C++ compiler is a pain. +And when the standard does add cool features, it takes up to three years for it to release plus some time for compilers to gain support *plus* time for these compiler versions to become ubiquitous so we can require them. Again: Upgrading the C++ compiler is a pain that we try not to inflict on our users. One good case study of the deficiencies of C++-in-practice is a C library: curses. This is a venerable library to access terminal features, and we exclusively use it to gain access to the terminfo database, which describes differences in terminal features and behavior. @@ -270,7 +271,7 @@ We've succeeded. This was a gigantic project and *we made it*. The sheer scale o The beta works very well. Performance is usually slightly better in terms of time taken, memory use has a slightly higher floor but a lower ceiling - it will use 8M instead of 7M at rest, but e.g. globbing a big directory won't make it go up as much. These things can all be improved, of course, but for a first result it is encouraging. -Fish is still a bit of an odd duck...fish as a Rust program. It has some bits that smell like C spirit, directly using the C API and passing around file descriptors instead of File objects, it still uses UTF-32 strings - which is why we are using a fork of the pcre2 crate because we couldn't convince the pcre2-crate maintainer to add UTF-32 support. We hope to find a nicer solution here, but it wasn't necessary for the first release. +Fish is still a bit of an odd duck...fish as a Rust program. It has some bits that smell like C spirit, directly using the C API and e.g. passing around file descriptors instead of File objects. It still uses UTF-32 strings - which is why we are using a fork of the pcre2 crate because we couldn't convince the pcre2-crate maintainer to add UTF-32 support. We hope to find a nicer solution here, but it wasn't necessary for the first release. The port wasn't without challenges, and it did not all go *entirely* as planned. But overall, it went pretty dang well. We're now left with a codebase that we like a lot more, that has already gained some features that would have been much more annoying to add with C++, with more on the way, and we did it while creating a separate 3.7 release that also included some cool stuff. From 8a3c8019dbff8abc37facd082930f25fd2aaf2f4 Mon Sep 17 00:00:00 2001 From: Fabian Boehm Date: Tue, 24 Dec 2024 15:44:54 +0100 Subject: [PATCH 04/15] Some more about autocxx/copying/performance during the port --- site/_posts/2024-12-25-rustport.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/site/_posts/2024-12-25-rustport.md b/site/_posts/2024-12-25-rustport.md index 38f031cfe..f3a3e035b 100644 --- a/site/_posts/2024-12-25-rustport.md +++ b/site/_posts/2024-12-25-rustport.md @@ -153,7 +153,7 @@ This includes the input/output "reader", which is, unsurprisingly, one of fish's During the port, we hit a bunch of snags with (auto)cxx. Sometimes it would just not understand a particular C++ construct, and we spent a lot of time trying to figure out ways to please it. As an example, we introduced a struct on the C++ side that wrapped C++'s `vector`, because for some reason autocxx liked to complain about `vector`. It lacks support for wstring/wchar, which is understandable because using wchar is a horrible decision - we only do it because it's a historical mistake. -Similarly, we had to wrap some C++ variables in `unique_ptr` and similar to make the ownership rules understandable to (auto)cxx. +Similarly, we had to wrap some C++ variables in `unique_ptr` and similar to make the ownership rules understandable to (auto)cxx, or copy values that didn't strictly need to be copied. This caused the performance during the port to go down quite a bit, but we regained all of it in most spots, and even beat the C++ version in some. We also patched autocxx to remove the requirement to use `unsafe` to invoke any C++ API, because that would have obscured uses of `unsafe` that wouldn't disappear just by porting the callee. We were building something temporary, so sometimes it is okay to do something a little underhanded. If you used this for a permanent bridge between Rust and C++ in a few parts of your code, the `unsafe` markers might be useful, but in our case they were noise. From 412fc762f4dcc8911ade6f0904bea6ab4276cb85 Mon Sep 17 00:00:00 2001 From: Fabian Boehm Date: Tue, 24 Dec 2024 15:47:40 +0100 Subject: [PATCH 05/15] Add note on widecharwidth --- site/_posts/2024-12-25-rustport.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/site/_posts/2024-12-25-rustport.md b/site/_posts/2024-12-25-rustport.md index f3a3e035b..f939c05cf 100644 --- a/site/_posts/2024-12-25-rustport.md +++ b/site/_posts/2024-12-25-rustport.md @@ -133,7 +133,7 @@ clean up allowed us to compare the before and after to find places where we had So we used [autocxx](https://google.github.io/autocxx/) to generate bindings between C++ and Rust code, allowing us to port one component at a time. -We started by porting the builtins. These are essentially little self-contained programs, with their own arguments, streams, exit code, etc. +We started[^technically] by porting the builtins. These are essentially little self-contained programs, with their own arguments, streams, exit code, etc. That means it's easy to port them separately from the rest of the shell once you have a way to call a Rust builtin from C++, which we had as part of the initial pull request. Where they connected to the main shell, we used one of three approaches: @@ -290,6 +290,9 @@ And we had fun doing it. [^stats]: That is assuming that there isn't a correlation between running fish and using an unusual processor architecture. Also this includes Hurd and kFreeBSD. +[^technically]: Technically the first part of fish to be switched to rust is our [widecharwidth library](https://github.com/ridiculousfish/widecharwidth), + which already had a rust port that is used in Wezterm. + [^terminfo]: We have discussed switching to not reading terminfo at all because in practice it is almost entirely useless. (we could write another 3000 words on the topic, but the short of it is that it is slow to update and integrate new features, often wrong, has no versioning mechanism and, most importantly, documents differences that barely exist anymore in the types of terminals that people actually use) [^formatting]: A lot of the increase in line count can be explained by rustfmt's formatting, as it likes to spread code out over multiple lines, like: From 04fb5aa50bab1120bca2dec34ec4b3dd9e76da8d Mon Sep 17 00:00:00 2001 From: Fabian Boehm Date: Tue, 24 Dec 2024 15:47:50 +0100 Subject: [PATCH 06/15] Add rust-terminfo link --- site/_posts/2024-12-25-rustport.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/site/_posts/2024-12-25-rustport.md b/site/_posts/2024-12-25-rustport.md index f939c05cf..3c5046365 100644 --- a/site/_posts/2024-12-25-rustport.md +++ b/site/_posts/2024-12-25-rustport.md @@ -227,7 +227,7 @@ which makes it easy to accidentally have very long build time. A lot of the benefits of porting to Rust will appear over time, but some are already here. -Remember our issues with (n)curses? We will no longer have any, because we no longer use curses. Instead we switched to [a Rust crate]() that gives us just what we need, which is access to terminfo and expanding its sequences. This removes some awkward global state, and means those building from source no longer need to ensure that curses is installed "correctly" on their system - cargo just downloads a crate and builds it. +Remember our issues with (n)curses? We will no longer have any, because we no longer use curses. Instead we switched to [a Rust crate](https://github.com/meh/rust-terminfo) that gives us just what we need, which is access to terminfo and expanding its sequences. This removes some awkward global state, and means those building from source no longer need to ensure that curses is installed "correctly" on their system - cargo just downloads a crate and builds it. We do still read terminfo, which means users need to install that, but that can be done at runtime, is preinstalled on all mainstream systems *and* if it can't be found we just use an included copy of the xterm-256color definitions[^terminfo]. From 89d0e08f2f36b999aec61ac08bddda31f9ac9977 Mon Sep 17 00:00:00 2001 From: Fabian Boehm Date: Fri, 27 Dec 2024 21:44:02 +0100 Subject: [PATCH 07/15] Address feedback --- site/_posts/2024-12-25-rustport.md | 53 +++++++++--------------------- 1 file changed, 16 insertions(+), 37 deletions(-) diff --git a/site/_posts/2024-12-25-rustport.md b/site/_posts/2024-12-25-rustport.md index 3c5046365..6fabd3a1d 100644 --- a/site/_posts/2024-12-25-rustport.md +++ b/site/_posts/2024-12-25-rustport.md @@ -15,7 +15,7 @@ We didn't post it anywhere, but other people did, and we got a lot of reactions. Observant readers will note that the PR was a proposal to rewrite the entirety of fish in Rust, from C++. -The fish-shell is no stranger to language changes - it was ported from pure C to C++ earlier in its life, +Fish is no stranger to language changes - it was ported from pure C to C++ earlier in its life, but this was a much bigger project, porting to a much more different language that didn't even exist when fish was started in 2007. Now that we've released the beta of fish 4.0, containing 0% C++ and almost 100% pure Rust, let's look back to see what we've learned, what went well, what could have gone better and what we can do now. @@ -28,44 +28,26 @@ experience with a roughly C++-shaped language should help. We've experienced some pain with C++. In short: -- tools -- slow-moving, but slower still until new versions are usable -- ergonomics -- compiler and platform differences -- (thread) safety -- dependency handling +- tools and compiler/platform differences +- ergonomics and (thread) safety +- community Frankly, the tooling around the language isn't good, and we had to take on some additional pain in order to support our users. -We want them to have an easy way to get the newest version, and we want to have contributions even by people who aren't on bleeding edge systems[^Contributions]. -That means we want it to be easy to build fish from source, and we want to build our own packages to have something to tell people who are on, -say, Ubuntu LTS and noticed that they are missing a cool new feature. We also prefer if we don't get reports of bugs -that we already fixed two versions ago[^LTS]. - -Doing that meant we could never rely on the newest C++ features. We started using C++11 in 2016, -and yet we *still* needed to upgrade the compilers on our build machines until 2020. -And upgrading the C++ compiler is annoying. +We want to provide up-to-date fish packages for systems that aren't up-to-date, like LTS Linux and older macOS. +But there is no 'rustup' for C++, no standard way to install recent C++ compilers on these operating systems. +This means adopting recent C++ standards would complicate the lives of packagers and would-be contributors[^Contributions]. +For example, we started using C++11 in 2016, and yet we still needed to upgrade the compilers on our build machines until 2020. Fish also uses threads for its award-winning (*note to editor*: find an actual award) autosuggestions and syntax highlighting, and one long-term project is to add concurrency to the language. -Here's a dirty secret: fish's script execution is currently entirely serial - you can't background functions, -and you can't even run builtins in parallel. This code: - -```fish -for i in 1 2 3 4 5 - sleep 1 -end | while read -l num - break -end -``` - -takes 5 seconds, because the `while read` loop can't even run before the `for` loop completes. +Here’s a dirty secret: while external commands run in parallel, fish’s execution of internal commands (builtins and functions) is currently serial. Lifting this limitation will enable features like asynchronous prompts or non-blocking completions. POSIX shells use subshells to get around this, but subshells are a leaky abstraction that can bite you in the behind when you least expect it. For instance you can't set variables from inside a pipe like that (except on some shells, but only in the last part of the pipe, maybe, if you have enabled the correct option). We would like to avoid that, and so the heavy hand of forking off a process isn't appealing. -This involves a lot of careful handling of shared state, and C++ famously does not help - thread safety is your responsibility as the programmer. +We prototyped true multithreaded execution in C++, but it just didn't work out. For example, it was too easy to accidentally share objects across threads, with only post-hoc tools like Thread Sanitizer to prevent it. The ergonomics of C++ are also simply not good - header files are annoying, templates are complicated, you can easily cause a compile error that throws *pages* of overloads in the standard library at you. Many functions are unsafe to use. C++ string handling is very verbose with easily confusable overloads of many methods, making it attractive to drop down to C-style char pointers, which are quite unsafe. @@ -104,7 +86,7 @@ Having an explicit `use` system where you know exactly which function comes from Rust makes it nice to add dependencies. We don't want to go overboard with it, but we do want to change our history format from our homegrown "I can't believe it's not YAML" to something specified that other tools can actually read, and Rust makes it easy to add support for YAML/JSON/KDL. -And yes, Rust promises to help us with our threading problem. +But the killer feature of Rust, from fish-shell's perspective, is Send and Sync, statically enforcing rules around threading. "Fearless concurrency" is too strong - you can still blow your leg off with fork or signal handlers - but Send and Sync will be the key to unlocking fully multithreaded execution, with confidence in its correctness. We did not do a comprehensive survey of other languages. We were confident Rust was up to the task and either already knew it or wanted to learn it, so we picked it. @@ -114,7 +96,7 @@ A lot of hay has also been made online about Rust's platform support (e.g. [in t Architecture support is even less of a problem - going by [debian's popcon](https://popcon.debian.org/), 99.9995% (the actual result, not an exaggeration) of machines run an architecture that has Rust packages in Debian. Given that fish is [installed on 1.92% of Debian systems](https://qa.debian.org/popcon.php?package=fish), we would project two (2) or three (3) machines of the quarter million responses to have fish on an unsupported architecture [^stats]. -Unlike what some online have assumed, a native Windows port was not a reason for switching to Rust as it was never in the cards. Fish is, at heart, a unix shell that relies not only on unix APIs but also their semantics, and exposes them in the scripting language. What would `test -x` say on Windows, which has no executable bit? These are issues that *could* be solved with a lot of work, but we're unix nerds making a unix shell, not one for Windows. +Unlike what some online have assumed, a native Windows port was not a reason for switching to Rust as it was never in the cards. Fish is, at heart, a UNIX shell that relies not only on UNIX APIs but also their semantics, and exposes them in the scripting language. What would `test -x` say on Windows, which has no executable bit? These are issues that *could* be solved with a lot of work, but we're unix nerds making a unix shell, not one for Windows. The one platform we care about a bit that it does not currently seem to have enough support for is Cygwin, which is sad but we have to make a cut somewhere. @@ -151,7 +133,7 @@ That's how it went for a while, but we finally hit the more entangled systems, w since that reduced the amount of tricky FFI code to be written only to be thrown away. These were ported in solo efforts. This includes the input/output "reader", which is, unsurprisingly, one of fish's biggest parts, ending up at about 13000 lines of Rust. -During the port, we hit a bunch of snags with (auto)cxx. Sometimes it would just not understand a particular C++ construct, and we spent a lot of time trying to figure out ways to please it. As an example, we introduced a struct on the C++ side that wrapped C++'s `vector`, because for some reason autocxx liked to complain about `vector`. It lacks support for wstring/wchar, which is understandable because using wchar is a horrible decision - we only do it because it's a historical mistake. +During the port, we hit a bunch of snags with (auto)cxx. Sometimes it would just not understand a particular C++ construct, and we spent a lot of time trying to figure out ways to please it. As an example, we introduced a struct on the C++ side that wrapped C++'s `vector`, because for some reason autocxx liked to complain about `vector`. We had to fork it to add support for wstring/wchar, which is understandable because using wchar is a horrible decision - we only do it because it's a historical mistake. Similarly, we had to wrap some C++ variables in `unique_ptr` and similar to make the ownership rules understandable to (auto)cxx, or copy values that didn't strictly need to be copied. This caused the performance during the port to go down quite a bit, but we regained all of it in most spots, and even beat the C++ version in some. @@ -195,7 +177,7 @@ It won't surprise anyone who has spent any time on this world of ours that Rust Chief among them is how Rust handles portability. While it offers many abstractions over systems, allowing you to target a variety of systems with the same code, when it comes to *adapting* your code to systems at a lower-level, it's all based on enumerating systems by hand, using checks like `#[cfg(any(target_os = "freebsd", target_os = "netbsd", target_os = "openbsd"))]`. -This is an imperfect solution, allowing you to miss systems and ignoring version differences entirely. From what we can tell, if Freebsd 12 gains a function that we want to use, libc would add it, but calling it would then fail on FreeBSD 11 without a good way to check, at the moment. +This is an imperfect solution, allowing you to miss systems and ignoring version differences entirely. From what we can tell, if FreeBSD 12 gains a function that we want to use, libc would add it, but calling it would then fail on FreeBSD 11 without a good way to check, at the moment. But listing targets in our code is also fundamentally duplicating work that the libc crate (in our case) has already done. If you want to call libc::X, which is only defined on systems A, B and C, you need to put in that check for A, B and C yourself and if libc adds system D you need to add it as well. Instead of doing that, we are using our own [rsconf](https://github.com/mqudsi/rsconf) crate to do compile-time feature detection in build.rs. @@ -208,9 +190,9 @@ We ported printf from musl, which we required for our own `printf` builtin anywa ### The Mistakes -We've hit some false starts, dead ends and other kinds of mistakes, for instance we originally originally used a fancy macro to allow us to write our strings as `"foo"L`, but that did not end up carrying its weight and we removed it in favor of a regular `L!("foo")` macro call. +We've hit some false starts, dead ends and other kinds of mistakes. For instance we originally used a fancy macro to allow us to write our strings as `"foo"L`, but that did not end up carrying its weight and we removed it in favor of a regular `L!("foo")` macro call. -We we were confused by a deprecation warning in the libc crate, which explains that "time_t" will be switched to 64-bit on musl in the future. +We were confused by a deprecation warning in the libc crate, which explains that "time_t" will be switched to 64-bit on musl in the future. We initially tried to work around it, adding a lot of wrappers to try to stay agnostic on that size, but only later figured out that it does not affect us, as we do not pass a time_t we get from one C library to another. (https://github.com/fish-shell/fish-shell/issues/10634) @@ -285,9 +267,6 @@ And we had fun doing it. both to get more testing and to take advantage of new features we introduce. So we want to make this as painless as possible. This is working rather well, overall - we have over 1000 completion scripts in our codebase. -[^LTS]: The idea that LTS users should report bugs to their distribution is basically fiction. Not only does it not happen but it would also - be a terrible idea given that fish is in Ubuntu's "universe" repository, meaning it is imported automatically from Debian and otherwise almost entirely unmaintained in Ubuntu. - [^stats]: That is assuming that there isn't a correlation between running fish and using an unusual processor architecture. Also this includes Hurd and kFreeBSD. [^technically]: Technically the first part of fish to be switched to rust is our [widecharwidth library](https://github.com/ridiculousfish/widecharwidth), From 230bfcdecd97a012aa02d7d00fa8e3c59f3a1d6c Mon Sep 17 00:00:00 2001 From: Fabian Boehm Date: Fri, 27 Dec 2024 22:05:04 +0100 Subject: [PATCH 08/15] Add some more to parallel execution --- site/_posts/2024-12-25-rustport.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/site/_posts/2024-12-25-rustport.md b/site/_posts/2024-12-25-rustport.md index 6fabd3a1d..f9104bbc4 100644 --- a/site/_posts/2024-12-25-rustport.md +++ b/site/_posts/2024-12-25-rustport.md @@ -41,10 +41,10 @@ For example, we started using C++11 in 2016, and yet we still needed to upgrade Fish also uses threads for its award-winning (*note to editor*: find an actual award) autosuggestions and syntax highlighting, and one long-term project is to add concurrency to the language. -Here’s a dirty secret: while external commands run in parallel, fish’s execution of internal commands (builtins and functions) is currently serial. Lifting this limitation will enable features like asynchronous prompts or non-blocking completions. +Here’s a dirty secret: while external commands run in parallel, fish’s execution of internal commands (builtins and functions) is currently serial and can't be backgrounded. Lifting this limitation will enable features like asynchronous prompts or non-blocking completions, as well as performance gains. POSIX shells use subshells to get around this, but subshells are a leaky abstraction that can bite you in the behind when you least expect it. -For instance you can't set variables from inside a pipe like that (except on some shells, but only in the last part of the pipe, maybe, if you have enabled the correct option). +For instance you can't set variables from inside a pipe (except on some shells, but only in the last part of the pipe, maybe, if you have enabled the correct option). We would like to avoid that, and so the heavy hand of forking off a process isn't appealing. We prototyped true multithreaded execution in C++, but it just didn't work out. For example, it was too easy to accidentally share objects across threads, with only post-hoc tools like Thread Sanitizer to prevent it. From a47512c57738b25390177d39388aad5705f09f0f Mon Sep 17 00:00:00 2001 From: Fabian Boehm Date: Fri, 27 Dec 2024 22:06:38 +0100 Subject: [PATCH 09/15] Remove duplicate "upgrading compilers is hard" --- site/_posts/2024-12-25-rustport.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/site/_posts/2024-12-25-rustport.md b/site/_posts/2024-12-25-rustport.md index f9104bbc4..034579e6c 100644 --- a/site/_posts/2024-12-25-rustport.md +++ b/site/_posts/2024-12-25-rustport.md @@ -54,8 +54,6 @@ easily confusable overloads of many methods, making it attractive to drop down t And the standard prioritizes performance over ergonomics. Consider for instance string_view, which provides a non-owning slice of a string. This is an extremely modern, well-liked feature that C++ programmers often claim is a great reason to switch to C++17. And it is extremely easy to run into use-after-free bugs with it, because the ergonomics weren't a priority. -And when the standard does add cool features, it takes up to three years for it to release plus some time for compilers to gain support *plus* time for these compiler versions to become ubiquitous so we can require them. Again: Upgrading the C++ compiler is a pain that we try not to inflict on our users. - One good case study of the deficiencies of C++-in-practice is a C library: curses. This is a venerable library to access terminal features, and we exclusively use it to gain access to the terminfo database, which describes differences in terminal features and behavior. This not only caused us grief by being unsafe to use in weird ways - the "cur_term" pointer (or sometimes macro!) can be NULL, and it is dereferenced in surprising places, but also caused a surprisingly high number of issues when building from source. This was either because there are multiple implementations of it with differences as useless as "this function takes a char on system X but an int on system Y", but also because users kept coming to us with new and exciting(ly terrible) ways to package and install it. The dependency system is the system package manager. From 0f366ba786a92f5ebe0ad2318232e339fc9313a2 Mon Sep 17 00:00:00 2001 From: Fabian Boehm Date: Fri, 27 Dec 2024 22:49:27 +0100 Subject: [PATCH 10/15] Titles --- site/_posts/2024-12-25-rustport.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/site/_posts/2024-12-25-rustport.md b/site/_posts/2024-12-25-rustport.md index 034579e6c..8010a6fd4 100644 --- a/site/_posts/2024-12-25-rustport.md +++ b/site/_posts/2024-12-25-rustport.md @@ -98,7 +98,7 @@ Unlike what some online have assumed, a native Windows port was not a reason for The one platform we care about a bit that it does not currently seem to have enough support for is Cygwin, which is sad but we have to make a cut somewhere. -## The story of the port +## The Story Of The Port We had decided we were gonna do a "Fish [Of Theseus](https://en.wikipedia.org/wiki/Ship_of_Theseus)" port - we would move over, component by component, until no C++ was left. And at every stage of that process, it would remain a working fish. @@ -238,7 +238,7 @@ We're also losing Cygwin as a supported platform for the time being, because the We hope that this situation changes in future, but we had also hoped it would improve during the almost two years of the port. For now, the only way to run fish on Windows is to use WSL. -## The Now +## The Present & The Future We've succeeded. This was a gigantic project and *we made it*. The sheer scale of this is perhaps best expresed in numbers: From 93a0cba1dbb022738d5f9f5b6315c913a91b53c9 Mon Sep 17 00:00:00 2001 From: Fabian Boehm Date: Fri, 27 Dec 2024 22:49:56 +0100 Subject: [PATCH 11/15] Adjust date --- site/_posts/{2024-12-25-rustport.md => 2024-12-28-rustport.md} | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename site/_posts/{2024-12-25-rustport.md => 2024-12-28-rustport.md} (99%) diff --git a/site/_posts/2024-12-25-rustport.md b/site/_posts/2024-12-28-rustport.md similarity index 99% rename from site/_posts/2024-12-25-rustport.md rename to site/_posts/2024-12-28-rustport.md index 8010a6fd4..b9844a297 100644 --- a/site/_posts/2024-12-25-rustport.md +++ b/site/_posts/2024-12-28-rustport.md @@ -1,7 +1,7 @@ --- layout: post title: "Fish 4.0: The Fish Of Theseus" -date: 2024-12-25 +date: 2024-12-28 categories: technical --- From e0100a580e0bb793705df5a35a33d7136232d370 Mon Sep 17 00:00:00 2001 From: Fabian Boehm Date: Fri, 27 Dec 2024 22:54:30 +0100 Subject: [PATCH 12/15] slight rewording --- site/_posts/2024-12-28-rustport.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/site/_posts/2024-12-28-rustport.md b/site/_posts/2024-12-28-rustport.md index b9844a297..010261767 100644 --- a/site/_posts/2024-12-28-rustport.md +++ b/site/_posts/2024-12-28-rustport.md @@ -54,7 +54,7 @@ easily confusable overloads of many methods, making it attractive to drop down t And the standard prioritizes performance over ergonomics. Consider for instance string_view, which provides a non-owning slice of a string. This is an extremely modern, well-liked feature that C++ programmers often claim is a great reason to switch to C++17. And it is extremely easy to run into use-after-free bugs with it, because the ergonomics weren't a priority. -One good case study of the deficiencies of C++-in-practice is a C library: curses. This is a venerable library to access terminal features, and we exclusively use it to gain access to the terminfo database, which describes differences in terminal features and behavior. +One good case study of the deficiencies of C++-in-practice is a C library: curses. This is a venerable library to access terminal features, and we use it to access the terminfo database, which describes differences in terminal features and behavior. This not only caused us grief by being unsafe to use in weird ways - the "cur_term" pointer (or sometimes macro!) can be NULL, and it is dereferenced in surprising places, but also caused a surprisingly high number of issues when building from source. This was either because there are multiple implementations of it with differences as useless as "this function takes a char on system X but an int on system Y", but also because users kept coming to us with new and exciting(ly terrible) ways to package and install it. The dependency system is the system package manager. From 21d80358be82cd1b1ba766c74add14b834fe4b02 Mon Sep 17 00:00:00 2001 From: Fabian Boehm Date: Sat, 28 Dec 2024 10:18:54 +0100 Subject: [PATCH 13/15] Build --- docs/blog/index.html | 5 + docs/blog/rustport/index.html | 354 ++++++++++++++++++++++++++++++++++ 2 files changed, 359 insertions(+) create mode 100644 docs/blog/rustport/index.html diff --git a/docs/blog/index.html b/docs/blog/index.html index efe144075..08198cd9f 100644 --- a/docs/blog/index.html +++ b/docs/blog/index.html @@ -28,6 +28,11 @@

fish shell blog

  • + Fish 4.0: The Fish Of Theseus +
  • +

    About two years ago, our head maintainer @ridiculousfish opened what quickly became our most-read pull request:

    +Read more +
  • fish-shell 4.0b1, now in Rust
  • fish is a smart and user-friendly command line shell with clever features that just work, without needing an advanced degree in bash scriptology. Today we are announcing an open beta, inviting all users to try out the upcoming 4.0 release.

    diff --git a/docs/blog/rustport/index.html b/docs/blog/rustport/index.html new file mode 100644 index 000000000..ad3cf387b --- /dev/null +++ b/docs/blog/rustport/index.html @@ -0,0 +1,354 @@ + + + + + + + + Fish 4.0: The Fish Of Theseus + + + + + + + + + + +
    +

    Fish 4.0: The Fish Of Theseus

    + + +

    About two years ago, our head maintainer @ridiculousfish opened what quickly became our most-read pull request:

    + + + +

    Truth be told, we did not quite expect that to be as popular as it was. +It was written as a bit of an in-joke for the fish developers first, and not really as a press release to be shared far and wide. +We didn’t post it anywhere, but other people did, and we got a lot of reactions.

    + +

    Observant readers will note that the PR was a proposal to rewrite the entirety of fish in Rust, from C++.

    + +

    Fish is no stranger to language changes - it was ported from pure C to C++ earlier in its life, +but this was a much bigger project, porting to a much more different language that didn’t even exist when fish was started in 2007.

    + +

    Now that we’ve released the beta of fish 4.0, containing 0% C++ and almost 100% pure Rust, let’s look back to see what we’ve learned, what went well, what could have gone better and what we can do now.

    + +

    We’re writing this so others can learn from our experience, but it is our experience and not an exhaustive study. +We hope that you’ll be able to follow along even if you have never written any rust, but +experience with a roughly C++-shaped language should help.

    + +

    Why are we doing this again?

    + +

    We’ve experienced some pain with C++. In short:

    + +
      +
    • tools and compiler/platform differences
    • +
    • ergonomics and (thread) safety
    • +
    • community
    • +
    + +

    Frankly, the tooling around the language isn’t good, and we had to take on some additional pain in order to support our users. +We want to provide up-to-date fish packages for systems that aren’t up-to-date, like LTS Linux and older macOS. +But there is no ‘rustup’ for C++, no standard way to install recent C++ compilers on these operating systems. +This means adopting recent C++ standards would complicate the lives of packagers and would-be contributors1. +For example, we started using C++11 in 2016, and yet we still needed to upgrade the compilers on our build machines until 2020.

    + +

    Fish also uses threads for its award-winning (note to editor: find an actual award) autosuggestions and syntax highlighting, +and one long-term project is to add concurrency to the language.

    + +

    Here’s a dirty secret: while external commands run in parallel, fish’s execution of internal commands (builtins and functions) is currently serial and can’t be backgrounded. Lifting this limitation will enable features like asynchronous prompts or non-blocking completions, as well as performance gains.

    + +

    POSIX shells use subshells to get around this, but subshells are a leaky abstraction that can bite you in the behind when you least expect it. +For instance you can’t set variables from inside a pipe (except on some shells, but only in the last part of the pipe, maybe, if you have enabled the correct option). +We would like to avoid that, and so the heavy hand of forking off a process isn’t appealing.

    + +

    We prototyped true multithreaded execution in C++, but it just didn’t work out. For example, it was too easy to accidentally share objects across threads, with only post-hoc tools like Thread Sanitizer to prevent it.

    + +

    The ergonomics of C++ are also simply not good - header files are annoying, templates are complicated, you can easily cause a compile error that throws pages of overloads in the standard library at you. Many functions are unsafe to use. C++ string handling is very verbose with +easily confusable overloads of many methods, making it attractive to drop down to C-style char pointers, which are quite unsafe.

    + +

    And the standard prioritizes performance over ergonomics. Consider for instance string_view, which provides a non-owning slice of a string. This is an extremely modern, well-liked feature that C++ programmers often claim is a great reason to switch to C++17. And it is extremely easy to run into use-after-free bugs with it, because the ergonomics weren’t a priority.

    + +

    One good case study of the deficiencies of C++-in-practice is a C library: curses. This is a venerable library to access terminal features, and we use it to access the terminfo database, which describes differences in terminal features and behavior.

    + +

    This not only caused us grief by being unsafe to use in weird ways - the “cur_term” pointer (or sometimes macro!) can be NULL, and it is dereferenced in surprising places, but also caused a surprisingly high number of issues when building from source. This was either because there are multiple implementations of it with differences as useless as “this function takes a char on system X but an int on system Y”, but also because users kept coming to us with new and exciting(ly terrible) ways to package and install it. The dependency system is the system package manager.

    + +

    Finally, subjectively, C++ isn’t drawing in the crowds. We have never had a lot of C++ contributors. Over the 11 years fish used C++, only 17 people have at least 10 commits to the C++ code. We also don’t know a lot of people who would love to work on a C++ codebase in their free time.

    + +

    Some parting thoughts we can give the C++ community: We would like to see improvements to ergonomics and safety of the language and the tools prioritized over performance, and we would like to see efforts to make C++ compilers easier to upgrade on real systems.

    + +

    Why Rust?

    + +

    We need to get one thing out of the way: Rust is cool. It’s fun.

    + +

    It’s tempting to try to sweep this under the rug because it feels gauche to say, but it’s actually important for a number of reasons.

    + +

    For one, fish is a hobby project, and that means we want it to be fun for us. Nobody is being paid to work on fish, so we need it to be fun. +Being fun and interesting also attracts contributors.

    + +

    Rust also has great tooling. The tools have really paid a lot of attention to use, and the compiler errors are terrific. Not even “compared to C++”, they just actually rule. And as we have tried to pay attention to our own error messages (fish has a bespoke error for if it thinks a file you told it to run has Windows line endings), +we like it.

    + +

    And it is easy to get that tooling installed - rustup is magic, and allows people to get started quickly, with minimal fuss or root permissions. +When the answer to “how to upgrade C++ compiler” is “find a repository (with root permissions), compile it yourself, install some other repository or a docker image”, +it is amazing how the Rust answer can just be “use rustup”.

    + +

    Rust has great ergonomics - the difference between C++’s pointers (which can always be NULL) and Rust’s Options are apparent very quickly even to those of us who had never used it before. We did have a backport of C++’s optional, and liked using it, but it was never as integrated as Rust’s Options were.

    + +

    Having an explicit use system where you know exactly which function comes from which module is a great improvement over #include.

    + +

    Rust makes it nice to add dependencies. We don’t want to go overboard with it, but we do want to change our history format from our homegrown “I can’t believe it’s not YAML” to something specified that other tools can actually read, and Rust makes it easy to add support for YAML/JSON/KDL.

    + +

    But the killer feature of Rust, from fish-shell’s perspective, is Send and Sync, statically enforcing rules around threading. “Fearless concurrency” is too strong - you can still blow your leg off with fork or signal handlers - but Send and Sync will be the key to unlocking fully multithreaded execution, with confidence in its correctness.

    + +

    We did not do a comprehensive survey of other languages. We were confident Rust was up to the task and either already knew it or wanted to learn it, so we picked it.

    + +

    Platform Support

    + +

    A lot of hay has also been made online about Rust’s platform support (e.g. in the git project). We don’t see a big problem here - all of our big platforms (macOS, Linux, the BSDs) are supported, as are Opensolaris/Illumos and Haiku. We have never heard of anyone trying to run fish on NonStop.

    + +

    Architecture support is even less of a problem - going by debian’s popcon, 99.9995% (the actual result, not an exaggeration) of machines run an architecture that has Rust packages in Debian. Given that fish is installed on 1.92% of Debian systems, we would project two (2) or three (3) machines of the quarter million responses to have fish on an unsupported architecture 2.

    + +

    Unlike what some online have assumed, a native Windows port was not a reason for switching to Rust as it was never in the cards. Fish is, at heart, a UNIX shell that relies not only on UNIX APIs but also their semantics, and exposes them in the scripting language. What would test -x say on Windows, which has no executable bit? These are issues that could be solved with a lot of work, but we’re unix nerds making a unix shell, not one for Windows.

    + +

    The one platform we care about a bit that it does not currently seem to have enough support for is Cygwin, which is sad but we have to make a cut somewhere.

    + +

    The Story Of The Port

    + +

    We had decided we were gonna do a “Fish Of Theseus” port - we would move over, component by component, until no C++ was left. +And at every stage of that process, it would remain a working fish.

    + +

    This was a necessity - if we didn’t, we would not have a working program for months, which is not only demoralizing but would also have precluded us from +using most of our test suite - which is end-to-end tests that run a script or fake a terminal interaction. We would also not have been able to do another C++ release, +putting some cool improvements into the hands of our users.

    + +

    Had we chosen to disappear into a hole we might not have finished at all, and we would have to re-do a bunch of work once it became testable. +We also mostly kept the structure of the C++ code intact - if a function is in the “env” subsystem, it would stay there. Resisting the temptation to +clean up allowed us to compare the before and after to find places where we had mistranslated something.

    + +

    So we used autocxx to generate bindings between C++ and Rust code, allowing us to port one component at a time.

    + +

    We started3 by porting the builtins. These are essentially little self-contained programs, with their own arguments, streams, exit code, etc. +That means it’s easy to port them separately from the rest of the shell once you have a way to call a Rust builtin from C++, which we had as part of the initial pull request.

    + +

    Where they connected to the main shell, we used one of three approaches:

    + +
      +
    1. Add some FFI glue to the C++ to make it callable from Rust, port the caller and leave the callee for later
    2. +
    3. Move the callee to Rust and, if necessary, make it callable from C++
    4. +
    5. Write a Rust version of the callee and call it from the ported caller, but leave the C++ version around
    6. +
    + +

    For instance, almost every builtin needs to parse its options. We have our own implementation of getopt, that we reimplemented in Rust in the initial PR, +but the C++ version stuck around until it had no more callers remaining. Otherwise we would have had to write a C++-to-Rust bridge and adjust the C++ callers to use it.

    + +

    Or the builtin builtin needs access to the names of all builtins to print them for builtin --get-names. In that case we bridged some access to what amounts to a constant vector of strings in the C++, and eventually moved it over once the users were in Rust.

    + +

    That’s how it went for a while, but we finally hit the more entangled systems, where porting larger chunks felt more productive, +since that reduced the amount of tricky FFI code to be written only to be thrown away. These were ported in solo efforts. +This includes the input/output “reader”, which is, unsurprisingly, one of fish’s biggest parts, ending up at about 13000 lines of Rust.

    + +

    During the port, we hit a bunch of snags with (auto)cxx. Sometimes it would just not understand a particular C++ construct, and we spent a lot of time trying to figure out ways to please it. As an example, we introduced a struct on the C++ side that wrapped C++’s vector, because for some reason autocxx liked to complain about vector<wstring>. We had to fork it to add support for wstring/wchar, which is understandable because using wchar is a horrible decision - we only do it because it’s a historical mistake.

    + +

    Similarly, we had to wrap some C++ variables in unique_ptr and similar to make the ownership rules understandable to (auto)cxx, or copy values that didn’t strictly need to be copied. This caused the performance during the port to go down quite a bit, but we regained all of it in most spots, and even beat the C++ version in some.

    + +

    We also patched autocxx to remove the requirement to use unsafe to invoke any C++ API, because that would have obscured uses of unsafe that wouldn’t disappear just by porting the callee. We were building something temporary, so sometimes it is okay to do something a little underhanded. +If you used this for a permanent bridge between Rust and C++ in a few parts of your code, the unsafe markers might be useful, but in our case they were noise.

    + +

    Because autocxx generated a lot of code, some tools also were less helpful than they’d usually be. rust-analyzer for instance was extremely slow.

    + +

    So, even though our codebase was fairly amenable to being moved to Rust because we didn’t use exceptions or a lot of templates, autocxx isn’t the easiest to work with. +It is absolutely magical that it works at all, and it enabled us to do this port, but it has a hard task to perform and isn’t perfect at it.

    + +

    The Timeline

    + +
      +
    • +

      The initial PR was opened on 28th January 2023, merged on 19th February 2023

      +
    • +
    • +

      fish 3.7.0, another release in the C++ branch to flush out some accumulated improvements, was released in January 2024

      +
    • +
    • +

      The last C++ code was removed in January 2024 (and some additional test code was ported from C++ to C 12th of June 2024)

      +
    • +
    • +

      The first beta was released 17th of December 2024

      +
    • +
    + +

    The initial PR had a timeline of “handwaving, half a year”. It was clear to all of us that it might very well be entirely off, and we’re not +disappointed that it was. Frankly, 14 months was still a pretty good pace, especially considering that we made a C++ release in-between, so it did not throw off our usual release cadence.

    + +

    Most of the work was done by 7 people (going by those with at least 10 commits to “.rs” files), but we got a lot of help from interested community members.

    + +

    The delay after that was down to a few reasons:

    + +
      +
    1. The “second 90%” - testing that everything worked. We flushed out a lot of bugs in this time, and if we made a release at that time it would have been a bad one.
    2. +
    3. Having something to release that’s visible to users - there’s no point in making a release that does the same thing in new code, you need it to do different things. +So we held off until we had something.
    4. +
    5. Simple availability - sometimes, some of us took time off.
    6. +
    + +

    So if you are trying to draw any conclusions from this, consider the context: A group of people working on a thing in their free time, +diverting some effort to work on something else, and deciding that after the work is finished it actually isn’t.

    + +

    The Gripes

    + +

    It won’t surprise anyone who has spent any time on this world of ours that Rust is not, in fact, perfect. We have some gripes with it.

    + +

    Chief among them is how Rust handles portability. While it offers many abstractions over systems, allowing you to target a variety of systems with the same code, +when it comes to adapting your code to systems at a lower-level, it’s all based on enumerating systems by hand, using checks like #[cfg(any(target_os = "freebsd", target_os = "netbsd", target_os = "openbsd"))].

    + +

    This is an imperfect solution, allowing you to miss systems and ignoring version differences entirely. From what we can tell, if FreeBSD 12 gains a function that we want to use, libc would add it, but calling it would then fail on FreeBSD 11 without a good way to check, at the moment.

    + +

    But listing targets in our code is also fundamentally duplicating work that the libc crate (in our case) has already done. If you want to call libc::X, which is only defined on systems A, B and C, you need to put in that check for A, B and C yourself and if libc adds system D you need to add it as well. Instead of doing that, we are using our own rsconf crate to do compile-time feature detection in build.rs.

    + +

    Most of this would be solved if Rust had some form of saying “compile this if that function exists” - #[cfg(has_fn = "fstatat")]. With that, the libc crate could do whatever checks it wants and fish would just follow what it did, and we could remove a lot of the use for rsconf. It would not really help support older distributions that lack some features, tho. That could be solved by something like the min_target_API_version cfg.

    + +

    While we’re on portability, the tools also sometimes fail to consider other targets - clippy may warn about a conversion being useless when it isn’t on another system, it is often better to use if cfg!(...) instead of #[cfg(...)] because code behind the latter is eliminated very early, so it may be entirely wrong and only shows up when building on the affected system.

    + +

    We’ve also had issues with localization - a lot of the usual Rust relies on format strings that are checked at compile-time, but unfortunately they aren’t translatable. +We ported printf from musl, which we required for our own printf builtin anyway, which allows us to reuse our preexisting format strings at runtime.

    + +

    The Mistakes

    + +

    We’ve hit some false starts, dead ends and other kinds of mistakes. For instance we originally used a fancy macro to allow us to write our strings as "foo"L, but that did not end up carrying its weight and we removed it in favor of a regular L!("foo") macro call.

    + +

    We were confused by a deprecation warning in the libc crate, which explains that “time_t” will be switched to 64-bit on musl in the future. +We initially tried to work around it, adding a lot of wrappers to try to stay agnostic on that size, but only later figured out that it does not affect us, +as we do not pass a time_t we get from one C library to another. (https://github.com/fish-shell/fish-shell/issues/10634)

    + +

    Some bugs appeared because we missed subtleties of the original code. +Often this turned into a crash because we used asserts or assert’s modern cousin “.unwrap()”. This was often the easiest way to translate the C++, +and sometimes it simply turned out to be not accurate, and had to be replaced with different error handling.

    + +

    But overall most of these were, once found, pretty shallow - “it panics here, why would it do that? oh, this can be an Err? Okay, what leads to that? Ah, okay, let’s handle that in this way”.

    + +

    We’ve also caused some friction by turning on link-time-optimization combined with having release builds as the default in CMake (currently needed to run the full test suite), +which makes it easy to accidentally have very long build time.

    + +

    The Good

    + +

    A lot of the benefits of porting to Rust will appear over time, but some are already here.

    + +

    Remember our issues with (n)curses? We will no longer have any, because we no longer use curses. Instead we switched to a Rust crate that gives us just what we need, which is access to terminfo and expanding its sequences. This removes some awkward global state, and means those building from source no longer need to ensure that curses is installed “correctly” on their system - cargo just downloads a crate and builds it.

    + +

    We do still read terminfo, which means users need to install that, but that can be done at runtime, is preinstalled on all mainstream systems and if it can’t be found we just use an included copy of the xterm-256color definitions4.

    + +

    We have also managed to create “self-installable” fish packages that include all the functions, completions and other asset files in the fish binary to be written out at runtime. +That allowed us to create statically linked versions of fish (for linux this uses musl, because glibc has unavoidable crashes!), so for the first time we have one file you can download and run on any linux (the only requirement being that the architecture matches!).

    + +

    This is a pretty big boon for people who want to use fish but sometimes ssh to servers, where they might not have root access to install a package. So they can just scp a single file and it’s available.

    + +

    This might be possible with C23’s #embed, but Rust allowed us to do it now and, overall, pretty easily.

    + +

    The Sad

    + +

    The one goal of the port we did not succeed in was removing CMake.

    + +

    That’s because, while cargo is great at building things, it is very simplistic at installing them. Cargo wants everything in a few neat binaries, +and that isn’t our use case. Fish has about 1200 .fish scripts (961 completions, 217 associated functions), as well as about 130 pages of documentation (as html and man pages), +and the web-config tool and the man page generator (both written in python).

    + +

    It also has a test suite that is light on unit tests but heavy on end-to-end script and interactive tests. The scripted tests run through our own littlecheck tool, +which runs a script and compares its output to embedded comments. The interactive tests are driven by pexpect, which fakes terminal interaction and checks that the right thing happens when you press buttons.

    + +

    We kept cmake, in a simplified form, for these tasks, but let it hand over the responsibility of building to cargo.

    + +

    It would be possible to switch all that to a simpler task runner like Just or even plain old makefiles, but since we already have this system we’re keeping it for now. +The upside is that the build process hasn’t really changed for packagers.

    + +

    We’re also losing Cygwin as a supported platform for the time being, because there is no Rust target for Cygwin and so no way to build binaries targeting it. +We hope that this situation changes in future, but we had also hoped it would improve during the almost two years of the port. +For now, the only way to run fish on Windows is to use WSL.

    + +

    The Present & The Future

    + +

    We’ve succeeded. This was a gigantic project and we made it. The sheer scale of this is perhaps best expresed in numbers:

    + +
      +
    • 1155 files changed, 110247 insertions(+), 88941 deletions(-) (excluding translations)
    • +
    • 2604 commits by over 200 authors
    • +
    • 498 issues
    • +
    • Almost 2 years of work
    • +
    • 57K Lines of C++ to 75K Lines of Rust 5 (plus 400 lines of C 6)
    • +
    • C++–
    • +
    + +

    The beta works very well. Performance is usually slightly better in terms of time taken, memory use has a slightly higher floor but a lower ceiling - it will use 8M instead of 7M at rest, but e.g. globbing a big directory won’t make it go up as much. These things can all be improved, of course, but for a first result it is encouraging.

    + +

    Fish is still a bit of an odd duck…fish as a Rust program. It has some bits that smell like C spirit, directly using the C API and e.g. passing around file descriptors instead of File objects. It still uses UTF-32 strings - which is why we are using a fork of the pcre2 crate because we couldn’t convince the pcre2-crate maintainer to add UTF-32 support. We hope to find a nicer solution here, but it wasn’t necessary for the first release.

    + +

    The port wasn’t without challenges, and it did not all go entirely as planned. But overall, it went pretty dang well. We’re now left with a codebase that we like a lot more, that has already gained some features that would have been much more annoying to add with C++, +with more on the way, and we did it while creating a separate 3.7 release that also included some cool stuff.

    + +

    And we had fun doing it.

    + +
    + +
    +
      +
    1. +

      We rely on contributions from as diverse a set of people as we can for our completion scripts. We can only really get a completion script for a tool from +someone who knows that tool. And ideally, they would also test their script with the newest source from git - +both to get more testing and to take advantage of new features we introduce. +So we want to make this as painless as possible. This is working rather well, overall - we have over 1000 completion scripts in our codebase. 

      +
    2. +
    3. +

      That is assuming that there isn’t a correlation between running fish and using an unusual processor architecture. Also this includes Hurd and kFreeBSD. 

      +
    4. +
    5. +

      Technically the first part of fish to be switched to rust is our widecharwidth library, +which already had a rust port that is used in Wezterm. 

      +
    6. +
    7. +

      We have discussed switching to not reading terminfo at all because in practice it is almost entirely useless. (we could write another 3000 words on the topic, but the short of it is that it is slow to update and integrate new features, often wrong, has no versioning mechanism and, most importantly, documents differences that barely exist anymore in the types of terminals that people actually use) 

      +
    8. +
    9. +

      A lot of the increase in line count can be explained by rustfmt’s formatting, as it likes to spread code out over multiple lines, like:

      +
      if opts.show
      +    && (opts.local
      +        || opts.function
      +        || opts.global
      +        || opts.erase
      +        || opts.list
      +        || opts.exportv
      +        || opts.universal)
      +
      + +

      which was one line in our C++ version.

      + +

      The rest is additional features.

      + +

      Also note that our Rust code is in some places a straight translation of the C++, and fully idiomatic Rust might be shorter. 

      +
    10. +
    11. +

      We use C in three places:

      +
        +
      • To connect some functions or variables that aren’t (yet) in the libc crate
      • +
      • To do compile-time feature detection
      • +
      • In our fish_test_helper binary, which mocks some unix behaviors for tests +(things like “print blocked signals” or “acquire the terminal”)
      • +
      +

      +
    12. +
    +
    +
    + +
    + + From d6c70a66cac8a435b4f1bf9e8facace099fbec64 Mon Sep 17 00:00:00 2001 From: Fabian Boehm Date: Sat, 28 Dec 2024 10:43:59 +0100 Subject: [PATCH 14/15] Typos --- docs/blog/rustport/index.html | 8 ++++---- site/_posts/2024-12-28-rustport.md | 8 ++++---- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/docs/blog/rustport/index.html b/docs/blog/rustport/index.html index ad3cf387b..be3f44007 100644 --- a/docs/blog/rustport/index.html +++ b/docs/blog/rustport/index.html @@ -70,7 +70,7 @@

    Why are we doing this again?

    Here’s a dirty secret: while external commands run in parallel, fish’s execution of internal commands (builtins and functions) is currently serial and can’t be backgrounded. Lifting this limitation will enable features like asynchronous prompts or non-blocking completions, as well as performance gains.

    POSIX shells use subshells to get around this, but subshells are a leaky abstraction that can bite you in the behind when you least expect it. -For instance you can’t set variables from inside a pipe (except on some shells, but only in the last part of the pipe, maybe, if you have enabled the correct option). +For instance, you can’t set variables from inside a pipe (except on some shells, but only in the last part of the pipe, maybe, if you have enabled the correct option). We would like to avoid that, and so the heavy hand of forking off a process isn’t appealing.

    We prototyped true multithreaded execution in C++, but it just didn’t work out. For example, it was too easy to accidentally share objects across threads, with only post-hoc tools like Thread Sanitizer to prevent it.

    @@ -118,11 +118,11 @@

    Platform Support

    A lot of hay has also been made online about Rust’s platform support (e.g. in the git project). We don’t see a big problem here - all of our big platforms (macOS, Linux, the BSDs) are supported, as are Opensolaris/Illumos and Haiku. We have never heard of anyone trying to run fish on NonStop.

    -

    Architecture support is even less of a problem - going by debian’s popcon, 99.9995% (the actual result, not an exaggeration) of machines run an architecture that has Rust packages in Debian. Given that fish is installed on 1.92% of Debian systems, we would project two (2) or three (3) machines of the quarter million responses to have fish on an unsupported architecture 2.

    +

    Architecture support is even less of a problem - going by Debian’s popcon, 99.9995% (the actual result, not an exaggeration) of machines run an architecture that has Rust packages in Debian. Given that fish is installed on 1.92% of Debian systems, we would project two (2) or three (3) machines of the quarter million responses to have fish on an unsupported architecture 2.

    Unlike what some online have assumed, a native Windows port was not a reason for switching to Rust as it was never in the cards. Fish is, at heart, a UNIX shell that relies not only on UNIX APIs but also their semantics, and exposes them in the scripting language. What would test -x say on Windows, which has no executable bit? These are issues that could be solved with a lot of work, but we’re unix nerds making a unix shell, not one for Windows.

    -

    The one platform we care about a bit that it does not currently seem to have enough support for is Cygwin, which is sad but we have to make a cut somewhere.

    +

    The one platform we care about a bit that it does not currently seem to have enough support for is Cygwin, which is sad, but we have to make a cut somewhere.

    The Story Of The Port

    @@ -277,7 +277,7 @@

    The Sad

    The Present & The Future

    -

    We’ve succeeded. This was a gigantic project and we made it. The sheer scale of this is perhaps best expresed in numbers:

    +

    We’ve succeeded. This was a gigantic project and we made it. The sheer scale of this is perhaps best expressed in numbers:

    • 1155 files changed, 110247 insertions(+), 88941 deletions(-) (excluding translations)
    • diff --git a/site/_posts/2024-12-28-rustport.md b/site/_posts/2024-12-28-rustport.md index 010261767..1c1096a53 100644 --- a/site/_posts/2024-12-28-rustport.md +++ b/site/_posts/2024-12-28-rustport.md @@ -44,7 +44,7 @@ and one long-term project is to add concurrency to the language. Here’s a dirty secret: while external commands run in parallel, fish’s execution of internal commands (builtins and functions) is currently serial and can't be backgrounded. Lifting this limitation will enable features like asynchronous prompts or non-blocking completions, as well as performance gains. POSIX shells use subshells to get around this, but subshells are a leaky abstraction that can bite you in the behind when you least expect it. -For instance you can't set variables from inside a pipe (except on some shells, but only in the last part of the pipe, maybe, if you have enabled the correct option). +For instance, you can't set variables from inside a pipe (except on some shells, but only in the last part of the pipe, maybe, if you have enabled the correct option). We would like to avoid that, and so the heavy hand of forking off a process isn't appealing. We prototyped true multithreaded execution in C++, but it just didn't work out. For example, it was too easy to accidentally share objects across threads, with only post-hoc tools like Thread Sanitizer to prevent it. @@ -92,11 +92,11 @@ We did not do a comprehensive survey of other languages. We were confident Rust A lot of hay has also been made online about Rust's platform support (e.g. [in the git project](https://lwn.net/Articles/998115/)). We don't see a big problem here - all of our big platforms (macOS, Linux, the BSDs) are supported, as are Opensolaris/Illumos and Haiku. We have never heard of anyone trying to run fish on NonStop. -Architecture support is even less of a problem - going by [debian's popcon](https://popcon.debian.org/), 99.9995% (the actual result, not an exaggeration) of machines run an architecture that has Rust packages in Debian. Given that fish is [installed on 1.92% of Debian systems](https://qa.debian.org/popcon.php?package=fish), we would project two (2) or three (3) machines of the quarter million responses to have fish on an unsupported architecture [^stats]. +Architecture support is even less of a problem - going by [Debian's popcon](https://popcon.debian.org/), 99.9995% (the actual result, not an exaggeration) of machines run an architecture that has Rust packages in Debian. Given that fish is [installed on 1.92% of Debian systems](https://qa.debian.org/popcon.php?package=fish), we would project two (2) or three (3) machines of the quarter million responses to have fish on an unsupported architecture [^stats]. Unlike what some online have assumed, a native Windows port was not a reason for switching to Rust as it was never in the cards. Fish is, at heart, a UNIX shell that relies not only on UNIX APIs but also their semantics, and exposes them in the scripting language. What would `test -x` say on Windows, which has no executable bit? These are issues that *could* be solved with a lot of work, but we're unix nerds making a unix shell, not one for Windows. -The one platform we care about a bit that it does not currently seem to have enough support for is Cygwin, which is sad but we have to make a cut somewhere. +The one platform we care about a bit that it does not currently seem to have enough support for is Cygwin, which is sad, but we have to make a cut somewhere. ## The Story Of The Port @@ -240,7 +240,7 @@ For now, the only way to run fish on Windows is to use WSL. ## The Present & The Future -We've succeeded. This was a gigantic project and *we made it*. The sheer scale of this is perhaps best expresed in numbers: +We've succeeded. This was a gigantic project and *we made it*. The sheer scale of this is perhaps best expressed in numbers: - 1155 files changed, 110247 insertions(+), 88941 deletions(-) (excluding translations) - 2604 commits by over 200 authors From 8be87f3d2ffdb9108b5914bce3631134fca5155e Mon Sep 17 00:00:00 2001 From: Fabian Boehm Date: Sat, 28 Dec 2024 10:46:23 +0100 Subject: [PATCH 15/15] builtin builtin builtin builtin builtin builtin --- docs/blog/rustport/index.html | 2 +- site/_posts/2024-12-28-rustport.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/blog/rustport/index.html b/docs/blog/rustport/index.html index be3f44007..981004106 100644 --- a/docs/blog/rustport/index.html +++ b/docs/blog/rustport/index.html @@ -153,7 +153,7 @@

      The Story Of The Port

      For instance, almost every builtin needs to parse its options. We have our own implementation of getopt, that we reimplemented in Rust in the initial PR, but the C++ version stuck around until it had no more callers remaining. Otherwise we would have had to write a C++-to-Rust bridge and adjust the C++ callers to use it.

      -

      Or the builtin builtin needs access to the names of all builtins to print them for builtin --get-names. In that case we bridged some access to what amounts to a constant vector of strings in the C++, and eventually moved it over once the users were in Rust.

      +

      Or the builtin builtin (the builtin called builtin) needs access to the names of all builtins to print them for builtin --get-names. In that case we bridged some access to what amounts to a constant vector of strings in the C++, and eventually moved it over once the users were in Rust.

      That’s how it went for a while, but we finally hit the more entangled systems, where porting larger chunks felt more productive, since that reduced the amount of tricky FFI code to be written only to be thrown away. These were ported in solo efforts. diff --git a/site/_posts/2024-12-28-rustport.md b/site/_posts/2024-12-28-rustport.md index 1c1096a53..69ac08c2f 100644 --- a/site/_posts/2024-12-28-rustport.md +++ b/site/_posts/2024-12-28-rustport.md @@ -125,7 +125,7 @@ Where they connected to the main shell, we used one of three approaches: For instance, almost every builtin needs to parse its options. We have our own implementation of getopt, that we reimplemented in Rust in the initial PR, but the C++ version stuck around until it had no more callers remaining. Otherwise we would have had to write a C++-to-Rust bridge and adjust the C++ callers to use it. -Or the `builtin` builtin needs access to the names of all builtins to print them for `builtin --get-names`. In that case we bridged some access to what amounts to a constant vector of strings in the C++, and eventually moved it over once the users were in Rust. +Or the `builtin` builtin (the builtin called `builtin`) needs access to the names of all builtins to print them for `builtin --get-names`. In that case we bridged some access to what amounts to a constant vector of strings in the C++, and eventually moved it over once the users were in Rust. That's how it went for a while, but we finally hit the more entangled systems, where porting larger chunks felt more productive, since that reduced the amount of tricky FFI code to be written only to be thrown away. These were ported in solo efforts.