A web-based data-to-audio cassette modem. View it live: https://nycki93.github.io/cassette-modem
By "modem" I mean modulator/demodulator, similar to a dial-up internet modem or a fax modem. Those devices turn data into audio and send it over a telephone wire to get it to their destination. Instead of that, I'm 'sending' my data to an audio cassette tape, which I can play back later to 'receive' it.
Because cassettes are cool! I was watching my partner play the excellent monster-fighting RPG, Cassette Beasts, in which you use audio cassettes to record bootleg copies of wild monsters and then morph yourself into them. It's really good and a fresh take on the genre.
Anyway, it made me really nostalgic for cassette tapes. When I was growing up (circa 2000) I didn't really record stuff of my own, I just listened to books on tape from the library. So, recently (circa 2024) I bought a used tape recorder and some Maxell blanks online and started playing around with them. I'd heard of people using cassettes for data storage back in the Commodore 64 era. The obvious next question is "okay, so how do I do that now?" And there are some options that exist, but I wasn't super happy with them so I'm making my own! More on that later.
Probably!
The first, most obvious choice is to go find some software used for Commodore 64 preservation, since that's the last well-known computer that actually used audio cassettes for storage. There's a list on Wikipedia here. This might be my best option, but most of these look like they're windows-specific and I'd like to be able to use an android phone or some sort of linux box as a portable modem.
The second, slightly more clever choice is to borrow some software used by Ham Radio operators for sending data over the air, and repurpose it to send data over a wire to my cassette recorder. The best option if I do it this way is probably direwolf, although it seems like it expects to run on a network so I might also need socat to get it pointed at the right output? More research required.
The third option is to use minimodem, a very clever piece of software that seems specifically tailored to this type of problem. It takes data in, produces modem noise out, and vice versa, in a number of formats. The main reason I'm not already using this is because it doesn't give me any way to catch and correct errors in the data. Maybe I could use redupe for this? It hasn't been updated in a while but maybe it doesn't need to be. More research needed.
And of course, the fourth option is to write the damn thing myself, for fun and experience, which is what I'm doing now. I'm writing it in JavaScript so I can easily host this demo on the web, but if I like what I get then I might rewrite it as a C application later.
Okay, let's say I think I can beat the pros, that somehow I can come up with something more efficient, or at least more user-friendly, than anything invented between then and now. When is "then", exactly?
The oldest spec I can find is the Kansas City Standard, as originally defined in 1976 by this article in a tech magazine. KCS is a simple and elegant solution to the problem of storing a small file, about 10 kB, on a computer with no hard disk or floppy drive. It's designed to be implementable in hardware, using a clock and some capacitors and stuff to measure how many audio waves pass through a sensor per millisecond. The baud rate is 300, meaning you send 300 symbols per second. A symbol can be anything, but in this case it's a cluster of waves. A cluster of 4 waves at 1200 Hz is the space symbol; a logical zero, and a cluster of 8 waves at 2400 Hz is the mark symbol; a logical one. By alternating between space and mark symbols, you can send binary data, at a maximum rate of 1 bit per symbol, or 300 bits per second.
In practice, though, if you tried to send bits at this maximum rate, eventually your clock would de-sync and you'd end up in the middle of a bit somewhere and then you'd be reading garbage until your clock happens to sync up again. So to make the signal more reliable, KCS uses 'padding' bits to mark the beginning and end of each byte. If your clock de-syncs, it should sync back up naturally by catching the beginning of a byte. This padding is sometimes called a UART frame, and the simplest possible UART frame has the format 0 xxxx xxxx 1, where the x's are your actual data and the 0 and 1 are the frame. The advantage of this is that you know a byte will always start with a 0 and end with a 1, making it possible to re-sync if you lose your place. The downside is that now you're using 20% of your data for padding, so instead of 300 bits per second, you're only sending 240 data bits per second.
Just because you can find your place again doesn't mean you'll never lose it in the first place. To correct small random errors, the simplest strategy is to make two complete copies of your data, one after the other. This cuts the effective data rate in half again to 120 bits per second. So to store a 10 kB file would take...
- 10 kB
- 10 * 1024 * 8 = 81,920 bits
- at 120 bits per second
- equals 682.66 seconds, or 11:23 minutes.
If most of your files are less than 10 kB large, then 11 minutes isn't so bad. But there's definitely room for improvement. I'd like to store a video game on a cassette, something like Galactic Foodtruck Simulator 2999, a text-art game released by WiL in 2023 using a game engine called ZZT from 1991. Since it's mostly text, it compresses nicely. The linux version, compressed, is only a 448 kB download, less than 1 MB. At 11:23 minutes per 10 kB, that's just about 8:30 hours of tape.
8:30 hours. 510 minutes. A single audio cassette only holds 90 minutes, and that's if you flip it over and use both sides. You can cheat a little bit if you have a cassette player that runs slow, but probably not enough to make it run for 8 hours.
We've got options though. See, KCS at 300 baud is actually an extremely cautious standard. We're sending clusters of 4 or 8 waves. We could send clusters of 1 or 2 waves, and instantly cut the time in quarters, getting down to 127.5 minutes. That's about three half-cassettes.
Instead of sending 1 or 2 waves at a time, what if we only ever send 1 wave, but it can be a short or long one? Then our "average" symbol will be 1.5 times the shorter wavelength. This cuts our expected time by a factor of 3/4, all the way down to 95.5 minutes, just barely too long to fit on a single cassette.
Do we really need to send everything twice? What if we send the first half of the file, then the second half, then the xor sum of both parts together? Then we'd effectively have three "blocks" of data, such that we can recover the entire file from any two of them. This cuts our 95.5-minute estimate down to 7 2 minutes, which can just barely fit on one cassette. In fact, if we implement Reed-Solomon error correction, we can do even better than xor, and split our data into an arbitrary (n+k)-many blocks, such that we can lose k-many blocks and still recover the data from the remaining n-many. Let's say we expect to lose 1% of our data. To be safe, we'll encode the file at 5% redundancy. If 95.5 minutes was 100% redundancy, then cutting that down to 5% redundancy cuts our time down to just 50 minutes. That almost fits on one side of the tape. This is looking doable.
We're sending 1 or 2 waves at 2400 or 1200 Hz. Can we go faster? Digitally sampling at 44100 times per second, we can theoretically recover waves much higher pitched than that. The highest pitched key on a piano is C8, officially tuned to 4186.01 Hz. Let's optimistically say we can get away with a 4800 Hz wave, which at 44100 samples per second, means we only get 9 frames to detect a wave.
Transmitting 1 or 2 waves at 4800 Hz or 2400 Hz gives us an average bit rate of 3600 bits per second. We're using 20% of that for byte frames, and of the remainder, we're using about 5% for redundancy. That gives us a maximum of about 2736 data bits per second. That gives us a theoretical time of 20 kB per minute, so for our 448 kB payload... 23 minutes. That'll fit! This is doable!
We've been making a lot of optimistic assumptions here about how we're going to do data correction. But let's be honest, if we just send the entire file in three huge blocks, we're not going to get it back, right? We're going to have errors in all three blocks. We need to divide the file into a lot more pieces, so that we can recover it from two of the three pieces from each group.
And this is where I was introduced to the OSI Model of data transfer, or as I sometimes call it, the Seven Layer Burrito.
Layer 1 of this model, the physical layer, is when you operate on individual bits and symbols. By stuffing those bits into 10-bit frames, we've implemented a rudimentary 2nd layer, the Data Link layer. Now we want to group those byte-frames together into chunks that can be reliably assembled into a file. These are called Layer 3 (Network Packets) and Layer 4 (Transport Datagrams).
I haven't finished designing this yet! I'm planning to use KISS Packets for my framing, and RAID 4-style data blocks, most likely.