As a long-term mysql user who has dabbled with production Postgres machines I still don't see how Postgres really has any huge advantages that would cause me to want to run it for any new projects and deal with a new and possibly daunting learning curve. Postgres seems to do some things better perhaps, and in general seems to have more flexibility than MySQL. But it also seems to lack focus as a end-to-end db solution.
The clustering / replication complexity of Postgres is a huge issue for me, since scaling databases is not a trivial thing in the best of cases. With postgres, there seem to be 10+ different cluster / replication systems (both open-source and commercial?) and it's just a mess.
Giving people tons of options is great, but ease of use for developers and an amazing & simple out of box experience is so important. MySQL won the hearts and minds of developers this way... and now MongoDB is doing the same thing by improving on what MySQL what able to accomplish. The Postgres team should consider implementing something similar.
I find postgres more pleasant and faster to develop in than mysql. It can help you avoid the need for a lot of application code, and feel more confident in your solution. I can't say any one thing that will convince you; you basically have to dive in and try doing things the "postgres way" (i.e. don't try to transplant a mysql mindset on postgres, or you will be disappointed). Try out the various features available, and see which ones might help you develop faster or avoid problems.
Postgres has strong support for single-master replication, including synchronous (several levels) or async, cascading, etc. Read scaling and HA are well covered.
What postgres doesn't have built-in is sharding or multi-master. I'm not trying to make excuses for this, but here are some thoughts to put things in perspective:
* There is active development on multi-master in postgres.
* You need to have quite a large database to not physically fit on a single node.
* By the time you really need to scale out writes beyond a single node, you may be very interested in the fine details of what's happening, and the flexibility and myriad options available in postgres may be just what you need.
* If you focus too much on one feature, it starts to seem more important than it is. It's easy to forget the flaws in the feature in other products; and the fact that businesses existed at high scale before the feature existed. Lack of a feature you want is more like a hurdle, not a dead-end.
>You need to have quite a large database to not physically fit on a single node.
If you are hosting in the cloud then pricing favours lots of smaller instances rather than single bigger ones. And many startups with limited budgets (e.g. me) still have significant data requirements.
> By the time you really need to scale out writes beyond a single node, you may be very interested in the fine details of what's happening, and the flexibility and myriad options available in postgres may be just what you need.
Or you can press a single button on most newer databases e.g. CouchDB, HBase and MongoDB and have sharding/clustering just work.
I don't know why people defend PostgreSQL on this and aren't pushing for a more coherent, in-built solution. It would definitely blunt a lot of the growth of NoSQL.
"I don't know why people defend PostgreSQL on this and aren't pushing for a more coherent, in-built solution."
I specifically said that I wasn't trying to make excuses, and that people are actively working on built-in multi-master replication.
I encourage you to keep pushing though. People have been finding weaknesses in postgres for a long time and those weaknesses have been disappearing quickly with releases delivered every year. All the while, some great innovations have been coming along that no other database has.
I've been involved in postgres for a long time and it's always interesting to think back on the way features have been demanded and then delivered. After multi-master, there will be another round of must-haves. But in the meantime, it's important to also work on new innovations that might ultimately be more important to the cloud use case than multi-master is.
One thing to consider is that because of its MVCC architecture (whereby reads can never block writes, and writes can never block reads), and especially with the locking changes coming in 9.2, you can probably get more concurrent activity out of a single, well-tuned Postgres instance than you could a slew of MySQL hosts of comparable size.
That wards off your need to go down the (admittedly sometimes complicated) path of PostgreSQL replication, and I say that as a guy who basically makes his living off setting up replicated/HA Postgres...
I'm well aware that InnoDB is MVCC-based. Unfortunately, its implementation of MVCC is hindered by the way the MySQL kernel handles locks, which are completely outside the domain of a storage engine.
Consider:
-- on MySQL, be sure to say "engine=innodb" before the semicolon...
CREATE TABLE locks_suck (id int primary key, val text);
-- or however you'd say "generate_series(start, end)" in MySQL...
INSERT INTO locks_suck (id) VALUES (generate_series(1, 10));
Now, say the following in one session:
BEGIN;
UPDATE locks_suck SET val = 'Transaction 1' WHERE id BETWEEN 1 AND 5;
-- Note that I haven't committed yet...
And then, in a second session:
BEGIN;
UPDATE locks_suck SET val = 'Transaction 2' WHERE id BETWEEN 6 AND 10;
That's the kind of concurrency MVCC can buy you. Readers can't block writers, writers can't block readers, and writers can only block other writers trying to write the same row.
Hey thanks for the example, I think your phrasing was a little unclear but clearly you know what you're talking about, so I reversed my downvote into an upvote. [edit: crap now I can't do that the buttons gone]
I also tried to look into the behavior and it turns out what we're running into here is a 'gap lock' necessary to support ranges in statement based replication (the default). You can change the behavior if you're using row based replication its just not the default.
For instance if the second transaction in your example was "BETWEEN 7 and 10" then both transactions would run concurrently no problem, its acting like row+1 level locking.
This appears to be an issue on how InnoDB handles range row locks, not MySQL (since it's InnoDB that's issuing that lock contention warning). It doesn't happen if you're not using ranges:
i.e. issuing in session 1
BEGIN;
UPDATE locks_suck SET val = 'Transaction 1' WHERE id = 1
and in session 2
BEGIN;
UPDATE locks_suck SET val = 'Transaction 2' WHERE id = 2
doesn't block. These locks are not issued at the MySQL layer when locks_suck is an InnoDB table, they are issued by InnoDB itself (MySQL doesn't actually handle transactions itself, so it would have to be the storage engine).
See dev.mysql.com/doc/refman/5.0/en/innodb-lock-modes.html for more info on how InnoDB issues those locks.
By choosing Pg over MySQL or Mongo, you are selecting a high-quality, stable product that is on a very slow and steady development trajectory with very few missteps along the way.
If you don't want to be surprised by the interesting and unexpected ways your database functions, choose Postgres.
Native JSON support is coming in 9.2 (being released very soon) and will rapidly improve.
PostgreSQL is being used by startups and major enterprises alike to great effect. If you have any interest in it, I encourage you to dive in to some real problems and see how postgres can help you solve them.
Yes, your criticisms may be valid, and postgres is always improving. But if you keep an open mind and actually try to solve real problems with it using all of the tools it has to offer, you may be pleasantly surprised.
Skype was supporting tens of millions of concurrent active users on a sharded PostgreSQL setup years ago (like, 2006-07 or so). Yes, the support wasn't native, and they had to write PgBouncer and PL/Proxy to enable that kind of scalability, but they Open Sourced both projects, and they're pretty widely used in many environments for exactly that purpose.
As far as JSON, the HStore extension has been available some time since 8.3, which was released in 2008. Again, not native (though that's being addressed with 9.2, which should drop any time now), but not particularly difficult to use. A trivial web search shows people using JSON with HStore at least as far back as 2010, if not earlier.
Skype added sharding support themselves. And it requires you to use stored procedures instead of SQL making it unusable for most users.
And I am talking about native JSON support as a first class citizen. Nobody is going to lock their entire data structure to a third party extension that may or may not disappear.
If your intention is store data structures in JSON form -- ie pretty much a pure graph database -- then maybe a relational database engine is a bad fit anyhow.
It's a lot nicer, but I see switching technologies as being a formula: if the benefit minus the cost is greater than zero, switch. The benefits can be gigantic and it can still be better not to switch if the costs are high enough. Your bank still runs COBOL, after all.
Most of the people writing these articles are in the best places to switch: noodling on the side, between gigs, or they're students. They're not running an enterprise supporting millions of litigious customers on hundreds of servers running millions of lines of proprietary software that all depend on MySQL. It would be absolute lunacy to switch in that case; even if Postgres were twice as fast, used half the space and cut your costs in half, it would still be hard to justify the cost and danger.
I think Postgres is a lot better, and anybody for whom the equation justifies it should switch. But that isn't going to be everybody every time, and that's fine, because we should make these decisions rationally rather than because it's cool now or it looks like it will be fun.
As a long-term mysql user who has dabbled with production Postgres machines I still don't see how Postgres really has any huge advantages that would cause me to want to run it for any new projects and deal with a new and possibly daunting learning curve. Postgres seems to do some things better perhaps, and in general seems to have more flexibility than MySQL. But it also seems to lack focus as a end-to-end db solution.
The clustering / replication complexity of Postgres is a huge issue for me, since scaling databases is not a trivial thing in the best of cases. With postgres, there seem to be 10+ different cluster / replication systems (both open-source and commercial?) and it's just a mess.
Giving people tons of options is great, but ease of use for developers and an amazing & simple out of box experience is so important. MySQL won the hearts and minds of developers this way... and now MongoDB is doing the same thing by improving on what MySQL what able to accomplish. The Postgres team should consider implementing something similar.