Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thanks! I guess I had some gaps in my mental model of how postgres handles these things, this clears it up quite a bit!

Just to confirm/repeat back:

* When you say DDL operations (ALTER TABLE) will "never modify data", they're still allowed to add a column with a non-null default since that's not "modifying data" by your definition, it's picking the value to use when introducing new data. I shouldn't think of it as "set implicit null to 3", I should think of it as "set nothingness to 3", just like how it would "set nothingness to null" in normal column creation. But it would never be willing to "set null to 3" in a DDL operation.

* Postgres doesn't have an O(1) way (meaning: never does a table scan while holding a significant lock) to transition a nullable column to non-null, since it'll need to check the contents of the entire table. (Although maybe it could use an index?) I guess I was hoping that the default would help here, but really it only affects creation of new values.

* More generally speaking, Postgres's design direction is to put data integrity as a higher priority than scale. It'll do both if it can (like in the article), but there are still cases where schema changes might cause user disruption if done on a gigantic table. It could have been designed so that schema changes are always fast (like they tend to be in NoSQL systems), but that would come at the expense of other nice properties.



* Correct. Adding a column requires providing a value for that column, and it will generally use NULL or the default value of the column when doing so. It does that because it has to (a column must have a value even if it's NULL). The only other time you may see DDLs "changing" data is certain data type changes. For example, if you change an approximate numeric data type to a precise numeric data type (e.g., float vs decimal) or vice-versa or some other data type with different levels of precision (e.g., SQL Server datetime vs datetime2), but you may still get errors or warnings.

* Correct. Adding a constraint or unique index will always require scanning the data to complete the transaction to ensure that it's consistent with the new rules. It's safe to assume that modifying the schema of a table itself in essentially any way will always require an exclusive table lock to complete. Even creating an index will often lock a table. PostgreSQL does allow you to create some indexes without a lock with the CONCURRENTLY option [0] and other RDBMSs may have similar options, but that's an additional feature that you'll want to check in your specific implementation with their own caveats.

* Correct. The design of the relational model which traditional SQL RDBMSs implement as a whole is to focus on ACID [1] compliance and strict determinism. It's focused on being correct first and foremost, and things like performance and availability take a backseat to that (which is where NoSQL data stores step in with different focuses). You wouldn't want to have that kind of focus on a social media site where it's more important to be available first and who cares if a bit of data is lost or broken or out of date now and then, but a financial, government, or medical system must store data correctly and must correctly retrieve data that it claimed to store correctly. Accuracy is paramount in those systems. This is one reason why DBAs have fairly universal reputations as extremely cautious perfectionists (the other reason is that if you don't design your tables and normalize your data it becomes extremely difficult to work with down the line).

MySQL is somewhat notorious for breaking many of the above rules, especially prior to MySQL 5.0. That is half the reason why it tends to have such a poor reputation among DBAs (the other reason being that Oracle is a terrible vendor). MySQL used to allow you to store invalid dates, would silently modify or truncate data rather than error, etc. It's much, much better now, but reputations are hard to repair in a world of strict rules and deterministic functions.

[0]: https://www.postgresql.org/docs/9.1/static/sql-createindex.h...

[1]: https://en.wikipedia.org/wiki/ACID_(computer_science)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: