Reading this reminded me a little of old works of philosophy. I found it tremendously educational, not just about postgres partitioning, but about databases in general. That degree of patience and thoroughness on the part of the teacher isn't something you see every day.
I wish more such conversations A) took place, and B) were transcribed and shared in a similar fashion.
This is quite a good informal high-level explanation.
A thing I can't understand is why this isn't made automatic.
If there are 100 companies, you don't want to create 100 tables, then add 100 FOREIGN KEY constrains, and then again each time a new company appears, do you? Wouldn't a "syntactic sugar" like "PARTITION BY company_id" that does all this automatically be possible?
Just asking if there is a reason something like this is not implemented.
I'm experiencing the pain. I've been championing postgres in my company for a couple of years now. The partitioning limitations have been quite embarrassing. It's not just the annoying lack of helper syntax to make partitioning better. It's the actual huge limitations on what the planner can consider when trying to find a derived table.
It pains me to say this, but when I was working with Oracle, its partitioning support was miles ahead of postgres (though not perfect by any stretch of the imagination).
i suspect this was the simple solution that the postgresql team could implement easily.
that is actually why i wrote the partitioned gem (which this explanation is apart of): https://github.com/fiksu/partitioned
the lack of automatic table management is pretty painful and the gem helps you manage such things.
Why can't they just improve the indexing to do this automatically behind the scenes? Isn't the problem of finding things quickly in exchange for more work during a write exactly what indexing is supposed to solve?
Well, be specific about what indexing is: it layers a B-tree on top of what programmers would think of as an array structure. Indexes are not just something which is always faster than a sequential scan in exchange for slightly slower writes - there are important cases where the index is slower than just doing a sequential scan. Obviously you hope the query planner will avoid using an index in these cases, but the important thing to bear in mind is that indexes are not magic go-faster stripes :)
The problem happens when this B-tree is too large to fit into your working memory: swapping parts of the B-tree in and out of memory repeatedly to answer a query is very slow. The solution that partitioning offers is to allow you to split your indexes (and your data) so that, depending on your access-patterns you could see less swapping of that index.
example you are keeping daily logs(daily partition), and you only want to keep the last 30 days, so you drop-table the latest partitions, which is faster than deleting X number of rows
Somehow http://www.postgresql.org/docs/current/static/ddl-inherit.ht... has always stopped me from using table inheritance; you don't get unique constraints over the table and its child tables, so for example you can't have an autoincrement primary key that works safely across them all. (Disclaimer: I haven't run into large enough tables yet that I'd need partitioning).
If this constraint was lifted, table inheritance would allow some really cool things, but currently I'm too conservative to use it in production.
Sequences are guaranteed to produce unique values. Just manually configure each child table to use the same sequence for the primary key column. Yes, uniqueness won't be enforced on the parent table, but the values will still be unique (unless you manually update them in your queries)
Sequences won't roll over by default. nextval() fails if you're out of values. You can create a sequence which is allowed to roll over (by specifying cycle), but it's not the default.
Yes. It's not optimal and you can shoot yourself in the foot, but it works in its default configuration. I just gave a workable workaround for a problem, I wasn't saying it's the perfect solution (a working unique constraint on the parent table would be)
I'm no database expert, but I don't think the features you want are possible, really. To get a unique constraint over multiple tables you'd have to lock them all, and if you lock them all you have lost the performance benefit of partitioning tables.
There are things you could do, like key on a GUID, that would ensure each insert has a unique ID in a lock-free way. But the semantics of that are going to be very different from a traditional autoincrementing ID.
Well, the locks would only be necessary for updates (or INSERTS where the constrained column doesn't come from the sequence), and that's a price I'd be willing to pay (hint, how often do you update the primary key column?)
yes, the parent is also a target table -- but for partitioning you never put anything in the parent. I could probably talk about the troubles of putting data in the parent table.
and yes, a trigger function is possible -- and I could have mentioned it (although this was written as an explanation of partitioning using the gem I created). I should mention the alternatives and why they weren't employed.
Great intro to Pg partitioning. I understand partitions can improve performance of a single query, scanning less rows, but any of you guys have idea about throughput implications? Is partitioning better or worse when it comes to 1000 small queries/updates per second? Thanks!
Can this be used to enforce a temporal limit on data? That is, if the check is for time to be within 24 hours of now will it purge older than 24 hours data as time progresses? My gut tells me no, but my gut is often wrong when it comes to databases.
if I understand you correctly, this type of thing is implemented by partitioning by hour in a day and then having a janitor process drop tables that are older than 24 hours.
Could you use a check constraint of FALSE on the parent table to prevent rows from being erroneously inserted there instead of the actual partition tables?
I wish more such conversations A) took place, and B) were transcribed and shared in a similar fashion.