Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The simple joys of scaling up (motherduck.com)
150 points by eatonphil on May 18, 2023 | hide | past | favorite | 45 comments


Probably biased given that it's on the DuckDB site, but well-reasoned and referenced, and my gut agrees with the overall philosophy

This feels like the kicker:

> In the cloud, you don’t need to pay extra for a “big iron” machine because you’re already running on one. You just need a bigger slice. Cloud vendors don’t charge proportionally more for a larger slice, so your cost per unit of compute doesn’t change if you’re working on a tiny instance or a giant one.

It's obvious once you think about it: you aren't choosing between a bunch of small machines and one big machine, you may very well be choosing between a bunch of small slices of a big machine and one big slice of a big machine. The only difference would be in how your software sees it: as a complex distributed system, or as a single system (that can eg. share memory with itself instead of serializing and deserializing data over network sockets)


The reason this feels non-obvious is that people like to think that they're choosing a variable number of small slices of a big datacenter, scaling up and down hour-by-hour or minute-by-minute to get maximum efficiency.

Really, though, you're generating enormous overhead while turning on and off small slices of a 128-core monster with a terabyte of RAM.


That's not the only difference - there are many more facets of reliability guarantees than the brief hand-waving this author does about it in the article.


(i work at motherduck)

over my career i’ve seen countless cases where the complexity of a distributed system introduces as many (or more) reliability concerns as it solves. when everything works well, it can be beautiful. but when things go wrong, they can quickly become a cascading mess of hard-to-debug issues.


FYI, this is from the MotherDuck blog. MotherDuck and DuckDB have a partnership, but they are separate entities.

> DuckDB Labs from its inception has received many investment offers from venture capitalists, but eventually decided not to take venture capital, and rather opted for a business model where it works with a few important partners. “We look forward to the continued growth and adoption that partnering with MotherDuck will bring. Thanks to the MotherDuck team, we are able to focus our efforts on independent innovation of the DuckDB core platform and growth of the open-source community.” says Mühleisen, who currently combines a tenured research position at CWI with the role of CEO at DuckDB Labs.

https://www.cwi.nl/en/news/cwi-spin-off-duckdb-labs-partners...

https://duckdblabs.com/news/2022/11/15/motherduck-partnershi...

https://www.techtarget.com/searchdatamanagement/news/2525272...


Never under estimate the value of a large dedicated server.

For ~$100/mo, you can get:

  AMD Ryzen 9 7950
  16-cores
  128GB RAM
  4TB NVMe
https://www.hetzner.com/dedicated-rootserver/ax102/configura...

For ~$800/mo, you can get:

  AMD EPYC 7763
  64-cores
  256GB RAM (up to 1TB)
  4TB NVME (up to 12TB)
https://www.ovhcloud.com/en/bare-metal/scale/scale-7/


Along those lines, the graph didn't show Cerebras 2.6 TRILLION transistor wafer scale WSE-2 deep learning accelerator.

Be interesting to see if wafer scale becomes more common. I would think wafer scale CPU core farms would help cloud providers deliver greater one-big-server goodness.

  850,000 cores

  40GB of ram distributed among cores

  220Pb/s of memory bandwidth

  Result: 7,500 trillion FP16 operations per second
Product page: https://www.cerebras.net/product-chip/

Tech dive: https://www.techinsights.com/blog/cerebras-dives-wse-archite...


Is it just me or is that very little RAM? What kind of workloads would you run on it?


The accelerator contains 40GB of SRAM

It's cache or scratchpad memory. Keep in mind this is "on chip", most systems, including this one, have additional memory off chip in the form of DRAM sticks.

edit: For comparison, the NVIDIA A100 has up to 164kB of shared memory per SM while the wafer beast has about 47kB per core if you divide it out.


Ionos have pretty good value dedicated servers too:

https://www.ionos.com/servers/dedicated-servers


[flagged]


Terrible offers. Only a single HDD/SSD. Also you're posting affiliate links


Those offers are optimized for CPU and memory. You can always add storage. Tell me where you can get a better offer.

An affiliate link? So what?


This is absolutely correct (and gets more correct every year).

A m5.large (2 vCPU, 8GB RAM) is $0.096/hr. m5.24xlarge (96 vCPU, 384GB RAM) is $4.608/hr.

Exactly 1:48 scale up, in capacity and cost.

The largest AWS instance is x2iedn.32xlarge (128 vCPU, 4096GB RAM) for $26.676/hr. Compared to m5.large, a 64x increase in compute and 512x increase in memory for 277x the cost.

Long story short.....you can scale up linearly for a long time in the cloud.


Except, it’s hard to do vertical scaling dynamically.

That kind of kills this whole argument except in certain pretty unusual cases.


> Except, it’s hard to do vertical scaling dynamically.

Easiest to just create a new instance scaled up vertically and swap it over.


How many times do you need to scale? In the usual case.


What’s the “usual case”? We scale up and down thousands of times a day.


Yeahz but the argument being made is that there's overhead in that, primarily in the extra Dev time needed to build such a system, and the extra failure points introduced requiring more maintenance effort.

Those two items can be substantial (5x headcount), or they can be merely "standard" (2x headcount).

They're never zero extra headcount.


> What’s the “usual case”?

Whatever isn't the "unusual cases" you mentioned.

> We scale up and down thousands of times a day.

Sounds unnecessary for most cases.


I think for most use cases overprovisioning a single instance to meet max denand will still be cheaper than dynamic provisioning with the synchronization overhead of a distributed system.


> A m5.large (2 vCPU, 8GB RAM) is $0.096/hr. m5.24xlarge (96 vCPU, 384GB RAM) is $4.608/hr. Exactly 1:48 scale up, in capacity and cost.

Only if you're paying on-demand prices. If you're paying spot prices, the spot market is much less linearly correlated.


I am in two minds about this article; while the author makes some excellent points, they're missing out some advantages of scale-out such as DR, blue-green deployment and multi-region.

But one other thing I think they fail to cover is this: most people don't need to build distributed systems anywhere near as soon as they think.

Scale-up makes the most sense for startups and small projects that haven't gained traction or scale yet. It's perfectly possible to architect your applications ready for the day when you _do_ need to scale out, but without spending time and money on the complexities of it until you actually _need_ to.


I agree. While data can always outgrow any single machine's ability to manage and process it in a optimal manner; the right software can allow larger data sets to be performant without resorting to distributed systems. If the software takes full advantage of all the CPU cores and has an intelligent caching strategy then some amazing things can happen on affordable hardware.

I am building a new kind of data management system. It can manage unstructured data like a file system as an object store. It can build very large relational tables and query them quickly. Its flexible schema allows the management of semi-structured data like NoSql systems. (https://didgets.com/)

It has distributed features built into the architecture, but I have only implemented the single node so far. On a $2000 desktop computer it can easily manage 200 million files and tables with 100M+ rows or thousands of columns.


I've been following DuckDB for a while now, and even tinkered with a layer on top called "IceDB" (totally needs a rewrite: https://blog.danthegoodman.com/introducing-icedb--a-serverle...)

The issue I see now is that there is no good way to know what files will match well when reading from remote (decoupled) storage.

While it does support hive partitioning (thank god), and S3 list calls, if you are looking at doing inserts frequently you need some way to merge these parquet files.

The MergeTree engine is my favorite thing about ClickHouse, and why it's still my go-to. I think if there was a serverless way to merge parquet (which was the aim of IceDB) that would make DuckDB massively more powerful as a primary OLAP db.


Yea, DuckDB is a slam dunk when you have a relatively static dataset - object storage is your durable primary SSOT, and ephemeral VMs running duckdb pointed at the object storage parquet files are your scalable stateless replicas - but the story gets trickier in the face of frequent ongoing writes / inserts. ClickHouse handles that scenario well, but I suspect the MotherDuck folks have answers for that in mind :)


IceDB sounds like an interesting project. What are your thoughts on Apache Iceberg? Seems like a similar idea, and gaining a lot of traction in the data-eng world.


Reminiscent of Adrian Cockcroft “Gigalith” architecture.

Not sure on the economy of scale though - if you need redundancy on a scale up architecture you essentially need to have parity compute sitting idle (likely in another location/region/data centre).

If you have an HA requirement your infra spend has just doubled, and the ROI on your DR tends to nothing (unless you can do something clever with on demand DR)? You’re also replicating everything your write to DR (so network and storage there).

Also more practically, you’re just less flexible. Waiting 30 minutes for a machine to checksum 1TB of memory before loading the OS is painful in an outage scenario (every reboot hurts, replenishing cache hurts, losing a massive instance hurts). Do like the sentiment around simplicity though!


> If you have an HA requirement your infra spend has just doubled, and the ROI on your DR tends to nothing (unless you can do something clever with on demand DR)? You’re also replicating everything your write to DR (so network and storage there).

You're quite correct, but it's not such a simple trade-off.

1. What's a "disaster"? A) "Datacenter burns down" is very different to B) "Instance got overloaded with unexpected spike, need more capacity". For A), your distributed system will still have outages in a particular area, so you'll spin up new instances in the nearest live DC, at a time when everyone else is trying to do the same! And B) is not a disaster.

2. Disaster recovery is, by definition alone, an exceptional event. If you're reaching for DR protocols more than once a year, there's something seriously wrong. Being able to scale out means that you only need a few minutes before you're up to capacity, but shaving off (say) 30m in the event of a genuine black swan event depends on the exact business. I've seen Fintechs, FAANGs, Fortune 100 companies, banks, etc have hours of downtime with no apparent negative effects. It's a black swan event that's not an extinction level event, nor even close.

A bigger problem for the business is that if a black swan event that results in:

> Waiting 30 minutes for a machine to checksum 1TB of memory before loading the OS is painful in an outage scenario

turns into an ELE, then it's the business that has problems, not the systems. A business that is so fragile to downtime is not going to be in business much longer anyway.


> If you have an HA requirement your infra spend has just doubled

Doesn't matter when it's still a lot cheaper than some the big clouds.

You can also HA to cloud with a hybrid approach so on-demand infra is only used to fill in when your main compute goes out.


This is an interesting post, thank you.

In my toy barebones SQL database, I store rows alternatedly on different replicas based on a consistent hash. I also have a "create join" statement, this keeps join keys colocated.

Then when there is a join query issued, I can always join because the join keys are available and the join query can be executed on each replica and returned to the client to be aggregated.

I want building distributed high throughput systems to be easier and less error prone. I wonder if a mixture of scale up and scale out could be useful architecture.

You want minimum network round trips or crossovers between threads (synchronization cost) as you can get.


FWIW I really like this new "neo-brutalist" website style with hard shadows, clear, solid lines and simple typography and layout.


Just make black actual black and it's good. No need for dark gray on light gray


All those Enterprise Architects being swayed by fancy Internet scale architecture presentations from the FAANG did lead to Roccococian scale out astrotectures for systems operating at scales 5 to 10 orders of magnitude below where that would make sense.


it's cloud companies raking in the profits of these hardware improvements. "Widely-available machines now have 128 cores and a terabyte of RAM." and I'm still paying $5 bucks for a couple cores.


There's a whole world of disconnect between the enterprise-focused offerings (AWS, GCP, Azure) and what most people would personally use: stuff like DigitalOcean, Vultr, Scaleway, Contabo, Hetzner and so on.

It's gotten to the point where I could personally never justify going for the larger platforms outside of an enterprise setting - even if I'll run my own business, it'll probably be on one of the latter platforms, because they still have most of the features that I actually need, while being more affordable.


If you think that's bad look at how much they earn on bandwidth


> I'm still paying $5 bucks for a couple cores.

That's definitely on the cheap side as far as cloud goes.


I mean, you probably want 2 for resilience but DRBD and active/passive setup is simple enough and doesn't get into the usual fuckery with having clustered DB


You will always be limited by network throughput. Sure that wire is getting bigger but so is your data


The i4i instances also have crazy fast disks to go along with that 1tb of ram. I hope to move all our stuff off i3 instances this year to i4.


Anybody have experience with k8s dynamically scaling up?


Sure, plenty of people, including me.


Are you thinking of scaling up your scale out?

You might be missing the point.


Nope, I read the article. Wonder if anybody uses k8s to migrate a monolith between different sized vms based on load


Moore's law freezes in the cloud.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: