Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Well, seems to me the difference between rowstores and columnstores might be a little more profound than just row and column major orders in C and FORTRAN. (Which I remember from back in the day)

In columnstores, it’s not just the ordering that’s advantageous for analytics. It’s also the possibility of scanning only the columns required (to the exclusion of others), and the possibility of column compression using RLE compression, which speeds up search when there are many repeated elements (also compression makes searches cache efficient). This means if you workload only involves a few columns, and your primary operations are filter and aggregate — analytics workloads — then the performance gains are tremendous.

On the other hand, if you want to optimize for row by row writes (OLTP), and if your workloads involve operations on many columns, then a rowstore is advantageous.

Row and column major layouts are more about exploiting sequentiality, but in practice both C and FORTRAN can provide good linear algebra performance.

But for analytics workloads on row vs columnstores, the difference in performance can be in the orders of magnitude. The fastest analytics databases in the world (Kdb, Vertica, Exasol, some of the newer GPU based databases) are almost all columnar.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: