I wish there are authoritative books or papers on how to build object stores lik...

emmanueloga_ · on Oct 5, 2023

> how to achieve practically unlimited throughput like S3 does

> like what kind of hardware would be optimal for S3's workload

> how to support strong consistency like S3 does without impacting system performance

I think most of these questions are first and most importantly _hardware_, _money_ and _physics_ questions.

I have no expertise in this matter, but I think this would be a good proxy answer: you make a system _seem_ to have unlimited throughput by interconnecting _a lot_ of the fastest available storage solutions with the fastest available network cards, putting it as close as possible to where your users are going to be. All of this is extremely expensive so you would need to have deep pockets, as amazon has, to make it possible (or a lot of investment).

I suspect with the right hardware and physical deployment locations you could fine tune Ceph or MinIO or whatnot to similar performance as S3. S3 is an object storage so the distributed system aspects of its implementation should definitely be a lot easier than, say, distributed SQL (not saying either is an "easy" thing to accomplish).

If you are interested in which hardware to use for a SAN, I found these benchmarks that may be exactly what you are looking for :-) [1]

--

1: https://www.spec.org/storage2020/results/

hintymad · on Oct 6, 2023

> I think most of these questions are first and most importantly _hardware_, _money_ and _physics_ questions.

Actually, money (more accurately, cost) is a constraint instead of a resource. S3 is known for its low cost, and S3 can easily dote out a 70% discount to its large customers and still make a profit. So, an interesting question is how to build an low-cost object store/

> interconnecting _a lot_ of the fastest available storage

"A lot of" leads to real challenges. Sooner or later you'll find that managing metadata will become a tough challenge. Case in point, Open-source systems often use systems like Zookeeper or etcd or single-node name server with hot standby for metadata management, which certainly won't be able to handle the scale of S3.

emmanueloga_ · on Oct 6, 2023

About cost, see [1]. Also, S3 prices have been increasing and there's been a bunch of alternative offers for object store from other companies. I think people in here (HN) comment often about increasing costs of AWS offerings.

Distributed systems and consensus are inherently hard problem, but there are a lot of implementations that you can study (like Etcd that you mention, or NATS [2], which I've been playing with and looks super cool so far :-p) if you want to understand the internals, on top of many books and papers released.

Again, I never said it was "easy" to build distributed systems, I just don't think there's any esoteric knowledge to what S3 provides.

--

1: https://en.wikipedia.org/wiki/Economies_of_scale

2: https://nats.io/

mwarkentin · on Oct 6, 2023

Uh, reference for s3 prices going up? I’ve only ever seen them heading down over time.

emmanueloga_ · on Oct 6, 2023

Sorry, this is just anecdotal from my recollection of reading random hacker news threads; I think people talk more about the bandwidth being expensive more than the storage itself [1].

--

1: https://hn.algolia.com/?dateEnd=1696464000&dateRange=custom&...

Shakahs · on Oct 6, 2023

It's not documented in a single authoritative place, but AWS has documented much of the S3 architecture you are curious about via white papers and presentations at Reinvent and other conferences.

This talk explains how S3 achieves such high performance: https://youtu.be/sc3J4McebHE?si=-slHmjdQ4Z0EKQXs&t=1386

catlover76 · on Oct 5, 2023

I haven't used it for anything that complicated yet, but this is the kind of stuff I have found GPT-4 really useful for, personally. Research, not refactoring.