Sorry, should have clarified: I have doubts about their necessity (as I stated i...

PeterCorless · on Jan 7, 2025

It all depends on what kind of queries you're running. I came from the OLTP market, where you're generally doing single-row operations. Basic CRUD. Single table work on denormalized data.

Now go to OLAP, and a single query might be doing multiple table joins. It might be scouring billions of records. It might need to do aggregations. Suddenly "millions of ops" might be reduced to 100 QPS. If you're lucky.

And yes, that's even using fast local NVMe. It's just a different kind of query, with a different kind of result set. YMMV.

sgarland · on Jan 7, 2025

Not sure why you think OLTP doesn’t also do complex joins. In a properly normalized schema, you’ll likely have many.

But yes, OLAP is of course its own beast, and most DBs are suited for one or the other.

PeterCorless · on Jan 7, 2025

I think it's a matter of use case. Doing ad hoc data exploration on an OLTP system generally sucks the wind out of the performance. Even if you have some type of workload prioritization, isolation, and limitation, allowing data scientists and business analysts freely wandering through your production OLTP database sounds like a Bad Time.

The organization might say "Okay. Maybe you should do your ad hoc exploration on an OLAP system. Preferably our data warehouse where you can let your report run for hours and we won't see a production brownout while it's running."

So complexity of ad hoc joins in the warehouse generally can get more complex.

riku_iki · on Jan 6, 2025

distributed DBs provide seamless fault tolerance story usually.

> Consider that a modern NVMe drive can quite easily deliver 1,000,000+ IOPS.

There could be other bottlenecks, for example I am consistently experiencing that linux kernel doesn't handle memory pages allocations fast enough once disk traffic hits few GB/s, because it does it in single thread.