Hacker Newsnew | past | comments | ask | show | jobs | submit | doronlevari's commentslogin

Great intro to Pg partitioning. I understand partitions can improve performance of a single query, scanning less rows, but any of you guys have idea about throughput implications? Is partitioning better or worse when it comes to 1000 small queries/updates per second? Thanks!


Hi, disclaimer - I work for ScaleBase, giving a true automated transparent sharding, so I live and breath sharding for 4 years now...

The main problem is user/session concurrency. On one machine - it kills at some (near) point. A DB is doing much more for every write then reads (look at my blog here: http://database-scalability.blogspot.com/2012/05/were-in-big...). The limit is here and now, even 100 heavy writing sessions will choke the MySQL (or any SQL DB...) on any hardware.

Catch 22: Scale-out to repl slaves with R/W splitting? This can lower read load on the master DB, but read load can be better lowered by caching. The problem is writes and small supporting transactional reads, and slaves won't help. Distributing data (sharding?) is the only way to distribute write intensive load, and it also helps reads by putting them on smaller chunks, and parallelizing them is a sweet sweet bonus :)

As I see around (hundreds of medium-large sites) - there's no other way...

And one final word about the cloud: "one DB machine" is limited to a rather limited non-powerful virtualized compute and I/O space... In the cloud limits are here and now! Cloud is all about elasticity and scale-out.

Hope I helped! Doron


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: