I wish there are authoritative books or papers on how to build object stores like S3 or software-defined storage in general. It looks object stores in public clouds have been so successful that few companies or research groups have been working on their own. Yes, I'm aware that we have systems like MinIO and Ceph, but so many questions are left unanswered, like how to achieve practically unlimited throughput like S3 does, like what kind of hardware would be optimal for S3's workload, like how to optimize for large-scan incurred by analytics workload, which S3 is really good at; like how to support strong consistency like S3 does without impacting system performance visibly even though S3 internally must have metadata layer, storage layer, and an index layer, like how to shrink or expand clusters without impacting user experience, like how to write an OSD that squeezes out every bit of hardware performance (vitastor claims so, but there's not many details), and the list goes on.
> how to achieve practically unlimited throughput like S3 does
> like what kind of hardware would be optimal for S3's workload
> how to support strong consistency like S3 does without impacting system performance
I think most of these questions are first and most importantly _hardware_, _money_ and _physics_ questions.
I have no expertise in this matter, but I think this would be a good proxy answer: you make a system _seem_ to have unlimited throughput by interconnecting _a lot_ of the fastest available storage solutions with the fastest available network cards, putting it as close as possible to where your users are going to be. All of this is extremely expensive so you would need to have deep pockets, as amazon has, to make it possible (or a lot of investment).
I suspect with the right hardware and physical deployment locations you could fine tune Ceph or MinIO or whatnot to similar performance as S3. S3 is an object storage so the distributed system aspects of its implementation should definitely be a lot easier than, say, distributed SQL (not saying either is an "easy" thing to accomplish).
If you are interested in which hardware to use for a SAN, I found these benchmarks that may be exactly what you are looking for :-) [1]
> I think most of these questions are first and most importantly _hardware_, _money_ and _physics_ questions.
Actually, money (more accurately, cost) is a constraint instead of a resource. S3 is known for its low cost, and S3 can easily dote out a 70% discount to its large customers and still make a profit. So, an interesting question is how to build an low-cost object store/
> interconnecting _a lot_ of the fastest available storage
"A lot of" leads to real challenges. Sooner or later you'll find that managing metadata will become a tough challenge. Case in point, Open-source systems often use systems like Zookeeper or etcd or single-node name server with hot standby for metadata management, which certainly won't be able to handle the scale of S3.
About cost, see [1]. Also, S3 prices have been increasing and there's been a bunch of alternative offers for object store from other companies. I think people in here (HN) comment often about increasing costs of AWS offerings.
Distributed systems and consensus are inherently hard problem, but there are a lot of implementations that you can study (like Etcd that you mention, or NATS [2], which I've been playing with and looks super cool so far :-p) if you want to understand the internals, on top of many books and papers released.
Again, I never said it was "easy" to build distributed systems, I just don't think there's any esoteric knowledge to what S3 provides.
Sorry, this is just anecdotal from my recollection of reading random hacker news threads; I think people talk more about the bandwidth being expensive more than the storage itself [1].
It's not documented in a single authoritative place, but AWS has documented much of the S3 architecture you are curious about via white papers and presentations at Reinvent and other conferences.
I haven't used it for anything that complicated yet, but this is the kind of stuff I have found GPT-4 really useful for, personally. Research, not refactoring.