The documentation is shit though, I spent a full hour digging through forum answers trying to increase my swap space. All this bpool zpool crap.
And then my system /boot got full of some snapshot (wtf I never asked for this), apt-get failed to live up to its promise of magic, got more hell about a 20% preservation rule (again wtf).
Was cutting and pasting some zpool zsysctl zc -a -f -foo -bar and then some sudo zfs destroy bpool npool zpool/blah/autosys@ubuntu_h2h3rc4h stuff. I cut and pasted a bunch of stuff off the forums I didn't understand until apt-get worked again.
I didn't understand a word of it, and there was zero documentation in the obvious places.
Solaris used to come with a user manual, it was easily the best thing about the buying experience because it was so detailed and obviously written for engineers.
If you can find one of those manuals on the internet you will be set for life on understanding ZFS and dtrace.
There’s also a C, C++ and ASM manual that is bundled too, but you can skip those.
If you can’t find any, I’ll send you mine.
I know it sounds like I’m asking you to RTFM- but the experience of reading these manuals is really a joy.
Well, it's an ROI judgement call at the end of the day. ZFS explicitly takes on significantly more irreducible complexity than ext4 so that it can cohesively tie together the features it offers into a complete system that is consistent for users to work with and for developers to maintain while introducing as few bugs as possible.
This reminds me that I wrote a little a while back about the occasionally nontrivial challenge of absorbing complex structures (https://news.ycombinator.com/item?id=25760518) when learning new concepts. I would personally absolutely love to be able to ingest complex ideas while not having to deal with the heightened semantic overhead, but I think that might be the human-learning equivalent of the P=NP problem. (The only solution seems to be finding neat elegant ways to represent things that happen to take advantage of subconscious shortcuts intrinsic to how we reason about the world, but there sadly seems to be no research being done on how to find and exploit those paths.)
ZFS itself seems to suffer from a bit of an above-average "newbies on soapboxes" problem sadly - a bit like the Rust community's "memory safety" crowd that don't completely understand what's going on, except in ZFS' case there are more than a few people who only know just enough to be dangerous, and are excellent at articulating themselves, loudly.
The collective consensus about ZFS is thus mostly comprised of many small pieces of arguably technically correct anecdata that miss just enough nuance that the overall perspective is shifted from reality by a nontrivial amount. My current favorite observable example of this is this downvoted (grey) comment by one of the ZFS developers clarifying that the system truly doesn't need a lot of RAM to run: https://news.ycombinator.com/item?id=11898292
Completely independently of this vocal-minority problem, ZFS' licensing situation inhibits the cohesive direction and leadership that would produce a fundamentally cohesive, holistic platform integration effort along with supporting documentation. There are hobbyists figuring things out as they go along on the one side, and commercial vendors providing SLA'd documentation as part of their enterprise support on the other, with a giant hole in the middle that in this case would probably be filled with a kernelspace documentation effort (which would be absolutely rock solid and excellent).
So if you can ignore all the vocal minorities and read between enough of the lines of the documentation you can find (is this valid for my OpenZFS version? is this FreeBSD-kernel specific? was this written by someone who knows what they're talking about? etc), you should be fine. You just have to accept the status quo and the tug of war that sadly tagged along with the excellent codebase.
Regarding the specific use case you described, I would point out a few details:
- AFAIK, ZFS pools (aka-but-not-exactly a partition) can't easily be resized; you generally have to recreate them
- When I was setting up my own ZFS configuration I repeatedly found warnings (without looking for them) in multiple setup guides and GitHub issue comments that swap on ZFS can cause deadlocks - hopefully you were using standard Linux swap partitions
- You aren't required to use the defaults of "bpool" and "rpool" (I chose my own pool and dataset names)
- Automatic snapshots on /boot sounds like a misconfiguration error (I never configured snapshots)
- Naturally I can only say that blindly copypasting commands that directly edit your filesystem is an excellent way to say goodbye to your data, with extra steps
My own experience was with configuring ZFS from scratch on Debian (following https://openzfs.github.io/openzfs-docs/Getting%20Started/Deb..., but mostly ignoring the Now We Setup 23489573647823 Separate Datasets Because We Can bits - I just have /debian (/), /data and /boot).
It sounds like you were fighting Ubuntu's autoconfigured setup, without understanding what defaults it picked or what things it did for you at install time. This is not at all ideal from a system-comprehension perspective.
So, it might be fair to shift some (maybe more than some) of the blame to Ubuntu for your negative experience, and not just associate ZFS on its own with that negativity.
I can highly recommend setting up Debian on a physical machine (that always helps me greatly, I find VMs too... intangible) and yelling at it until a) you understand everything you've done and b) ZFS works. In that order. Getting it working without needing to completely reinstall (ie breaking something, then fixing the breakage, without needing to wipe and starting again) would also be an excellent goal... perhaps on the 2nd or 3rd reinstall :P
I've also found it extremely useful to create a ~4GB or so partition at the end of my disk to store a "recovery" system; what I'll do is debootstrap Debian onto the recovery partition, boot that, install ZFS support, configure pools and datasets, debootstrap Debian onto the new dataset, get that working (might take a couple goes - I always miss a step, like configuring GRUB, or remembering to run `passwd`, or edit /etc/fstab, etc), and then I have a 4GB ext4 partition that knows how to mount ZFS if I need it.
This strategy actually came in handy bigtime a couple months back when I broke GRUB boot (unrelated issue) and was able to fallback-boot into the recovery partition to get everything working again: the recovery partition was able to mount / as needed so I could chroot into my main system and reinstall GRUB correctly.
This appears to be specific to RAIDZ though, which I think I'm not using. I'm just using mirrors at the moment.
I'm incidentally currently working through the logistics for a potential upgrade and am seriously torn about whether to switch over to RAIDZ2 or stick with a couple of independent mirror pools. The "all disks thrash on rebuild after failure" property of RAIDZ2 has me scrambling for the sequential simplicity of mirrors, but... giant contiguous datasets... hmmmmmm...
And then my system /boot got full of some snapshot (wtf I never asked for this), apt-get failed to live up to its promise of magic, got more hell about a 20% preservation rule (again wtf).
Was cutting and pasting some zpool zsysctl zc -a -f -foo -bar and then some sudo zfs destroy bpool npool zpool/blah/autosys@ubuntu_h2h3rc4h stuff. I cut and pasted a bunch of stuff off the forums I didn't understand until apt-get worked again.
I didn't understand a word of it, and there was zero documentation in the obvious places.
I'm done with ZFS. Back to ext4. It just works.