I’m sure someone is out there claiming that AI is going to solve all your business’s problems no matter what they are. Remotely sane people are saying it will solve (or drastically improve) certain classes of problems. 3x code? Sure. 3x the physical hardware in a data center? Surely not.
Nope. Public employee unions bring zero value and this incident is not evidence to support such unions. Relying on unions to act as ersatz safety regulators would be stupid, just completely the wrong approach. Decisions about things like ATC procedures, staffing levels, and training standards should be the responsibility of apolitical career bureaucrats.
Why would a career bureaucrat be a more efficient way to figure out how to attract and retain ATC workers, ass opposed to a union representing those ATC workers?
Your proposal intentionally injects inefficiency and noise into the system because you don't like some political boogeyman.
In addition to that, what they don’t mention is that:
1. Other app stores like Google Play and Steam haven’t seen this rapid rise.
2. There are thousands maybe tens of thousands of apps that are just wrappers calling OpenAI APIs or similar low effort AI apps making up a large percentage of this increase.
3. There are billions of dollars pouring into AI startups and many of them launch an iOS app.
Has steam not seen a rapid rise in AI-asset shovelware?
I'm not talking about the AAA or the AA or even the A space (where AI is being incorporated into dev processes with various degrees of both success and low effort slop), I'm talking about the actual bottom of the barrel.
You never needed AI to make shovelware, you have been able to make a shitty game over a weekend ever since RPG maker was made and there are still games made using that.
AI just helps create some assets for games, it doesn't really make it easier or faster to make games but they might look a bit better.
I can’t speak to the quality of all the games released, but in January 2025 there were 1,413 games released on Steam and in January of this year there were 1,448.
> It's like that FT chart claiming that the rapid rise in iOS apps is evidence of an AI-fueled productivity boom.
I mean, there is evidence for some change. Personally, I'm sceptical of what this will amount to, but prior to EOY 2025, there really wasn't any evidence for an app/service boom, and now there's weak evidence, which is better than none.
Because so much technical functionality has been lost/paywalled/dark patterned/enshitified, I've cut the number of apps I use. I've realized building core personal functionality around the whims of corporations eventually just gets weaponized against me, so I might as well start undoing that on my own terms. Who in 2026 is really bringing in a new app/Saas to do much of anything like we naively did a decade ago? No one I know, we've been shown we will be treated as suckers for doing that.
The bird not having wings, but all of us calling it a 'solid bird' is one of the most telling examples of the AI expectations gap yet. We even see its own reasoning say it needs 'webbed feet' which are nowhere to be found in the image.
This pattern of considering 90% accuracy (like the level we've seemingly we've stalled out on for the MMLU and AIME) to be 'solved' is really concerning for me.
AGI has to be 100% right 100% of the time to be AGI and we aren't being tough enough on these systems in our evaluations. We're moving on to new and impressive tasks toward some imagined AGI goal without even trying to find out if we can make true Artificial Niche Intelligence.
This test is so far beyond AGI. Try to spit out the SVG for a pelican riding a bicycle. You are only allowed to use a simple text editor. No deleting or moving the text cursor. You have 1 minute.
As for MMLU, is your assertion that these AI labs are not correcting for errors in these exams and then self-reporting scores less than 100%?
As implied by the video, wouldn't it then take 1 intern a week max to fix those errors and allow any AI lab to become the first to consistently 100% the MMLU? I can guarantee Moonshot, DeepSeek, or Alibaba would be all over the opportunity to do just that if it were a real problem.
Yeah, I've found AI 'miracle' use-cases like these are most obvious for wealthy people who stopped doing things for themselves at some point.
Typing 'Find me reservations at X restaurant' and getting unformatted text back is way worse than just going to OpenTable and seeing a UI that has been honed for decades.
If your old process was texting a human to do the same thing, I can see how Clawdbot seems like a revolution though.
Same goes for executives who vibecode in-house CRM/ERP/etc. tools.
We all learned the lesson that mass-market IT tools almost always outperform in-house, even with strong in-house development teams, but now that the executive is 'the creator,' there's significantly less scrutiny on things like compatibility and security.
There's plenty real about AI, particularly as it relates to coding and information retrieval, but I'm yet to see an agent actually do something that even remotely feels like the result of deep and savvy reasoning (the precursor to AGI) - including all the examples in this post.
> Typing 'Find me reservations at X restaurant' and getting unformatted text back is way worse than just going to OpenTable and seeing a UI that has been honed for decades.
Your conflating the example with the opportunity:
"Cancel Service XXX" where the service is riddled with dark patterns. Giving every one an "assistant" that can do this is a game changer. This is why a lot of people who aren't that deep in tech think open claw is interesting.
> We all learned the lesson that mass-market IT tools almost always outperform in-house
Do they? Because I know a lot of people who have (as an example) terrible setups with sales force that they have to use.
> We all learned the lesson that mass-market IT tools almost always outperform in-house,
Funny, I learned the exact opposite lesson. Almost all software suck, and a good way for it not to suck is to know where the developer is and go tell them their shit is broken, in person.
If you want a large scale example, one of the two main law enforcement agency in france spun off libreoffice into their own legal writing software. Developped by LEOs that can take up to two weeks a year to work on that. Awesome software. Would cost litterally millions if bought on the market.
One of the most important details of Sacks's life which dogged him nearly to the end (and which is important to this NY piece), was a minimization by Sacks of his own sexuality. He was not "openly gay" at all.
One of the biggest problems frontier models will face going forward is how many tasks require expertise that cannot be achieved through Internet-scale pre-training.
Any reasonably informed person realizes that most AI start-ups looking to solve this are not trying to create their own pre-trained models from scratch (they will almost always lose to the hyperscale models).
A pragmatic person realizes that they're not fine-tuning/RL'ing existing models (that path has many technical dead ends).
So, a reasonably informed and pragmatic VC looks at the landscape, realizes they can't just put all their money into the hyperscale models (LP's don t want that) and they look for start-ups that take existing hyperscale models and expose them to data that wasn't in their pre-Training set, hopefully in a way that's useful to some users somewhere.
To a certain extent, this study is like saying that Internet start-ups in the 90's relied on HTML and weren't building their own custom browsers.
I'm not saying that this current generation of start-ups will be successful as Amazon and Google, but I just don't know what the counterfactual scenario is.
The question that isn't answered completely in the article is how useful are the pipelines for these startups? The article certainly implies that for at least some of these startups there very little value add in the wrapper.
Got any links to explanations of why fine tuning open models isn’t a productive solution?
Besides renting the GPU time, what other downsides exist on today’s SOTA open models for doing this?
reply