Thanks. Comments like this that go into the technical details, the history of the details and the people behind the technology are one of the main reasons I frequent hacker news.
Before diving into ways that SQL could improve I'd like to give some thanks to a real workhorse that has proved useful over decades, which is an incredibly long time in tech.
Could it be better? Sure, but author's proposal doesn't solve where I usually have problems. What are my pain points and what would I like to see instead?
1) One giant statement. Personally I really like Hadley Wickham's dplyr [1,2] (think "data pliers") which has a SQL like notion of joining different tables and selecting values but separates the filter, mutate and summarise verbs as separate steps in a pipeline rather than one huge statement. For transactions dplyr would have to add an update verb as well.
2) Hard to test, especially for more complex ETL. dplyr approach highlights that a lot of SQL these days is being used in ETL applications in addition to the usual retrieval, transactions and reporting. Being able to express as a pipeline of operations is easier for me to understand as execution is conceptually consecutive and I can unit test individual parts as part of a normal programming language environment.
3) My data isn't all tabular. Better support and semantics for non-scalar entries where value is a record itself like in json, BigQuery, Hive, Presto, etc.
4) Not that extendible. Better support for user defined operations (UDFs). More and more frequently I want to apply some non-trivial operation to data, e.g. run a machine learning model and it makes sense to do that as close as possible to the data usually. It is possible to do a fair bit in SQL itself with window functions but it is generally painful. You can point Hive at your jar and run a UDF but it is also painful to integrate and debug in my experience.
I'd like to see Netflix invest more in making a great browsing discovery experience. I feel like their UI is really optimized to the watch something quick metric now. It seems like search is pretty well implemented these days and both YouTube and Netflix recommendations have come a long ways but it still seems hard to replicate that browsing experience that I get at the local bookstore with some rudimentary sections and staff picks.
+1 to this post by koliber. Only thing I would add is to plan as detailed as you can everything beforehand. Draw it up, visualize it. What sort of hinges on doors, how will the cabinets swing, etc. etc. Before you even start building a wall you have to decide how wide you want the trim around the doorway as you'll want the light switch just that far from the door and you'll want a 2x4 there with electrical. So there are a lot of cascading intertwined non-obvious decisions and it is remarkably expensive to refactor a house partway through construction.
First, try to understand a process. Draw it. Imagine going through the process of building it. Draw it again. Look at your friend's house to see how it is different. Try to imagine how that one was built.
I have an interesting tale of how the patio door installer almost caused my kitchen countertops to be installed 2 inches lower. The story is a bit convoluted because many things are tied together in the house. You have levels for your foundations, sills, door bottoms, door tops, window tops, subfloors, and floors. Different layers have different widths. Wood flooring ends up being a certain width, and tile flooring another. If you want things to be flush when you're done, you need to plan everything meticulously, backwards, across months of time and different crews.
Moral of the story: pay attention to the levels of everything, across the house, all the time, in one place. Double-check this and verify all the time.
Are the software tools used to build a buildprint smart enough to take into account the local code, comfortable spacing/clearances, etc.? Is there a blueprint equivalent of lint?
Almost certainly not for local code. My experience is that building codes make the IRS tax codes look clear and concise. Often multiple conflicting rules apply and you're trying to hammer out some reasonable common sense with 2-3 different people in planning office one of which is almost certain to go on vacation while you're in permitting process. You'd be luck to get city plan reviewer, architect and contractor all in the same room for an extended discussion.
There are architectural programs that will render a 3d space to help with visualizations. Or any 3d drawing program like autocad can help as well.
Generally the linting is done manually when you ask for bids on your plans from contractors. If your plans are really poor they won't bid or they will come back with suggestions or prices that incentivize you to rethink things.
I really like notebooks as a way to share an analysis or data visualization. For me the biggest benefits are:
- Integration. So nice to have access to the compute, data and libraries all in one place. There is a surprising amount of hassle moving data, setting up paths and libraries, etc. Notebooks almost act like for a container for data analysis rather than service.
- Sharable and reproducible. My coworkers can reproduce and explore some new idea with almost no effort, especially important when their strengths are more ML or stats than devops.
- Literate programming. It is really nice to have plots, markdown and code all it one place when deliverable is a report or analysis rather than code.
Even with these benefits I do think the criticisms about software development are right on point. Notebooks are a step backward in terms of software engineering environment with none of the modern tooling, version control, testing frameworks etc. I think that the folks who dismiss notebooks as a platform though are missing some important benefits that have long been absent in current editors. Larger companies like Facebook and Google are already facing the reality that devops is a pain point even for sophisticated software engineers and have developed remote code editors like cider and nucleotide to try and enable bringing code development to the data and compute rather than doing it from the laptop. R has been working on a long time on integrating analysis results and code in a reproducible package with sweave/knitr and python now has pweave in a similar fashion.
I hear all the issues with jupyter and I'm not particularly married to current form of notebooks. I do think though that the features of remote development, data visualization and support for literate/report programming and sharing ode are first class features that I'll continue to want in the future.
Completely agree about missing the actual benefit to society of focusing on better education and opportunities for all. Amazing how quickly things can go off the rails when you start evaluating middle management off some metric like "number of diversity hires" which seems like a good idea but then backfires into something ridiculous like this. I hope this sets a fire under Google's management to keep better track of HR
I think we can all agree that bad managers suck and managers who pretend to be technical particularly suck. However it feels the me though like we greatly underestimate the difficulty of being a technical manager and grievously under train them. Google has gone past the one-off personal experiences and studied across lots of teams and projects[1] and being a technical expert wasn't even close to the most important trait. I'd like to see us spend less time proposing "no managers" as current ones aren't good and more time on figuring out how to make them good and give them training and support they need. Most of the ones I know want to be successful and are trying in their own imperfect way. If you think the most important thing is technical prowess just take a look at any academic CS department, run only by super technical professors it is often a case study in anti-patterns.
From Google's study on managers reported in nytimes[2]
“In the Google context, we’d always believed that to be a manager,
particularly on the engineering side, you need to be as deep or deeper
a technical expert than the people who work for you,” Mr. Bock says.
“It turns out that that’s absolutely the least important thing.
It’s important, but pales in comparison. Much more important is just making
that connection and being accessible.”
I like this idea. I wish more of the places I have worked had explicit guidelines around how to work as a team (e.g. don't trivialize work that isn't). It doesn't take many poor timed comments to shut down a team's discussion.