Hannes and me developed both MonetDBLite and DuckDB, precisely for the need that...

scoresmoke · on May 24, 2020

You are doing a fantastic job and I am wishing you the best of luck!

I used only Python API of both DBs and what confused me is the mandatory requirement of NumPy and Pandas. I think ndarray/DataFrame retrieval and conversion should surely be optional. Some applications do not require all these features and can go ahead with the built-in types (mine just uses fetchall()).

mytherin · on May 24, 2020

Good point, we will make it optional :) Thanks for the feedback!

lumost · on May 24, 2020

ooc do you plan on binding in common functions for the DS/ML use cases? Things like

- String similarity measures

- ROC-AUC/MSE/correlation/Precison/Recall etc.

- LSH

- Sampling/joining with random records.

Keeping all of the transformation/prep logic in the sql engine seems like a great performance savings over python, and would also speed up the dev time for building up the code surrounding the ML functionality.

mytherin · on May 24, 2020

We already have a number of statistical ops (e.g. correlation) available, and we are planning to add more. The exact timeline I cannot promise, but feel free to open issues with the specific operations you are interested in/you think will be useful. We are always happy to review PRs as well :)