-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why BTRFS doesn't offer live deduplication as ZFS? #304
Comments
A brief history of Linux dedupe implementations
The ZFS approach does avoid duplicate data writes, and it's almost unique among dedupe strategies for that. It also has far higher memory and metadata IO requirements than any other dedupe strategy, all else being equal. The design is a naive content-addressable data store, despite enormous resources available to Sun's developers at the time, and active contemporary research in alternative dedupe strategies like VDO and reflink. The conventional guidance for ZFS users is to never use dedupe, especially after reflink support became available in ZFS, despite recent ZFS dedupe improvements.
My guess is that in-band dedupe in btrfs simply isn't worth the effort:
(edit) Oh, I almost forgot about VDO, the Other Inline Deduper. VDO's design, grossly oversimplified, holds a hash table in memory which is used to locate and remap duplicate 4K blocks. VDO's main problem is that blindly deduping at 4K block size leads to massive fragmentation due to nuisance dedupes, which ultimately kills performance. Naive deduping a block device at greater than 4K block size leads to almost zero hit rates, because dedupe can only occur when a filesystem extent aligns with dedupe blocks at fixed locations. The main complaint I used to get about bees from VDO users is that bees (up to v0.10) and VDO were too similar--because they both could dedupe on 4K boundaries, they both fragmented the filesystem to exhaustion. bees v0.11 fixes that by adding heuristics to determine if bees should dedupe smaller extents. It looks like this could be fixed in VDO the same way, but for whatever reason, the maintainer hasn't chosen to do that yet (there are github issues on kvdo about this, closed 6 years ago). |
Really high level question, I know. Really appreciate this project, but can't stop thinking that if the hash table was managed by the kernel, async (AIO) was used in kernel to spool the writes and check extends prior of write, the initial write could be deduped instead of being rewritten?
I guess this would add way to much delays to be implemented inside of the kernel as ZFS does? (Not tested but plan doing so on top of QubesOS since they now ship dom0 requirements for dom0, while installer doesn't permit ZFS partitioning and pool creation so templates and qubes are created in stage2 install as for BTRFS).
Was interested in reading your thoughts on why BTRFS doesn't offer live deduplication and only offline deduplication, which bees permits.
Thanks!
The text was updated successfully, but these errors were encountered: