Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Sorry, should have clarified: I have doubts about their necessity (as I stated in another comment).

Most tech companies have poor knowledge of proper data modeling and SQL, leading to poor schema design, and suboptimal queries. Combine that with the fact that networked storage (e.g. EBS) is the norm, and it’s no wonder that people think they need another solution.

The amount of QPS you can get out of a single DB is staggering when it’s correctly designed, and on fast hardware with local NVMe disks (or has a faster distributed storage solution). Consider that a modern NVMe drive can quite easily deliver 1,000,000+ IOPS.



It all depends on what kind of queries you're running. I came from the OLTP market, where you're generally doing single-row operations. Basic CRUD. Single table work on denormalized data.

Now go to OLAP, and a single query might be doing multiple table joins. It might be scouring billions of records. It might need to do aggregations. Suddenly "millions of ops" might be reduced to 100 QPS. If you're lucky.

And yes, that's even using fast local NVMe. It's just a different kind of query, with a different kind of result set. YMMV.


Not sure why you think OLTP doesn’t also do complex joins. In a properly normalized schema, you’ll likely have many.

But yes, OLAP is of course its own beast, and most DBs are suited for one or the other.


I think it's a matter of use case. Doing ad hoc data exploration on an OLTP system generally sucks the wind out of the performance. Even if you have some type of workload prioritization, isolation, and limitation, allowing data scientists and business analysts freely wandering through your production OLTP database sounds like a Bad Time.

The organization might say "Okay. Maybe you should do your ad hoc exploration on an OLAP system. Preferably our data warehouse where you can let your report run for hours and we won't see a production brownout while it's running."

So complexity of ad hoc joins in the warehouse generally can get more complex.


distributed DBs provide seamless fault tolerance story usually.

> Consider that a modern NVMe drive can quite easily deliver 1,000,000+ IOPS.

There could be other bottlenecks, for example I am consistently experiencing that linux kernel doesn't handle memory pages allocations fast enough once disk traffic hits few GB/s, because it does it in single thread.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: