Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You can't do updates, but you can create tables using the AggregatingMergeTree table engine.

I've been working out how to use this for tracking count/first seen/last seen for data sets. With normal sql you need to do upserts, but with clickhouse you can just create the table using countState, minState, maxState and insert:

  1, now, now
for count, first, last, and when it compacts the table it'll apply the aggregations.

Basically what you do with with https://github.com/facebook/rocksdb/wiki/merge-operator just with a few lines of SQL instead of a ton of custom code.



Interesting, I'll have an experiment with this today. I think I can see the pattern but I'm not entirely sure it'll work quite as I'm hoping.

So assuming I've got a nice primary key (which thankfully I do for the main data I care about), I could have a table something like (syntax will be wrong but):

    PrimaryKey
    Something
    Otherthings
    maxState(LastUpdated)
Then when selecting I can run queries efficiently to only get the latest version of a row?

The bit I'm not sure on is I feel a lot of this is designed to work nicely with columns, and in this case I have a fairly row-like constraint (latest row with a key).

Should be an interesting day of investigating though, thanks!


Here's a workable example that I came up with:

  CREATE TABLE mt2
  (
    whatever Date DEFAULT '2000-01-01',
    key String,
    value String,
    first AggregateFunction(min, DateTime),
    last AggregateFunction(max, DateTime),
    total AggregateFunction(count, UInt64)
  ) ENGINE = AggregatingMergeTree(whatever, (key, value), 8192)
insert into mt2 (key, value, first,last,total) select 'www.google.com', '1.2.3.4', minState(toDateTime(1498241729)),maxState(toDateTime(1498241729)), countState(cast(1 as UInt64));

insert into mt2 (key, value, first,last,total) select 'www.google.com', '1.2.3.5', minState(toDateTime(1498242729)),maxState(toDateTime(1498242729)), countState(cast(1 as UInt64));

insert into mt2 (key, value, first,last,total) select 'www.google.com', '1.2.3.6', minState(toDateTime(1498242729)),maxState(toDateTime(1498242829)), countState(cast(1 as UInt64));

select key, value, minMerge(first), maxMerge(last), countMerge(total) from mt2 group by key, value;

optimize table mt; -- compact the table and merge multiple rows with the same (key, value)

You have to add the date column, but you can ignore it.

The last query can be a view or even a materialized view.


Thanks!

I was also playing yesterday with CollapsingMergeTree.

    CREATE TABLE cmt
    (
      whatever Date DEFAULT '2000-01-01',
      key String,
      value String
      sign Int8
    ) ENGINE = CollapsingMergeTree(whatever, (key, value), 8192, sign)

Now you can 'delete' a row by sending the same row again but with a sign of -1:

    insert into cmt (key, value, sign) values ('k1', 'v1', 1)
    insert into cmt (key, value, sign) values ('k1', 'v1', -1)
    insert into cmt (key, value, sign) values ('k1', 'v1 update', 1)
    insert into cmt (key, value, sign) values ('k2', 'just delete this one', 1)
    insert into cmt (key, value, sign) values ('k2', 'just delete this one', -1)
You have to either add FINAL onto the query or optimise the table as far as I can tell for this to work, or hope it's done it in the background.

I thought you had to use the sign in the query otherwise it wouldn't work, but creating this example this morning works fine. If you're not getting the response you expect after optimising, try adding the sign column to the query.

    :) select key, value, sign from cmt

    SELECT
        key,
        value,
        sign
    FROM cmt
    
    ┌─key─┬─value─────┬─sign─┐
    │ k1  │ v1 update │    1 │
    └─────┴───────────┴──────┘
    
    1 rows in set. Elapsed: 0.002 sec.
    
    :)

[disclaimer - I know very little about how to make high performance things in clickhouse, tbh the Log format has been easily fast enough for my data so far]




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: