More

RC_ITR · 2026-02-18T16:43:36 1771433016

It's like that FT chart claiming that the rapid rise in iOS apps is evidence of an AI-fueled productivity boom.

I always ask people, in the past year, how many AI-coded apps have you 1) downloaded 2) paid for?

sarchertech · 2026-02-18T16:56:48 1771433808

In addition to that, what they don’t mention is that:

1. Other app stores like Google Play and Steam haven’t seen this rapid rise.

2. There are thousands maybe tens of thousands of apps that are just wrappers calling OpenAI APIs or similar low effort AI apps making up a large percentage of this increase.

3. There are billions of dollars pouring into AI startups and many of them launch an iOS app.

vkou · 2026-02-18T18:38:59 1771439939

Has steam not seen a rapid rise in AI-asset shovelware?

I'm not talking about the AAA or the AA or even the A space (where AI is being incorporated into dev processes with various degrees of both success and low effort slop), I'm talking about the actual bottom of the barrel.

Jensson · 2026-02-18T21:18:11 1771449491

You never needed AI to make shovelware, you have been able to make a shitty game over a weekend ever since RPG maker was made and there are still games made using that.

AI just helps create some assets for games, it doesn't really make it easier or faster to make games but they might look a bit better.

sarchertech · 2026-02-18T18:48:39 1771440519

I can’t speak to the quality of all the games released, but in January 2025 there were 1,413 games released on Steam and in January of this year there were 1,448.

disgruntledphd2 · 2026-02-18T16:52:33 1771433553

> It's like that FT chart claiming that the rapid rise in iOS apps is evidence of an AI-fueled productivity boom.

I mean, there is evidence for some change. Personally, I'm sceptical of what this will amount to, but prior to EOY 2025, there really wasn't any evidence for an app/service boom, and now there's weak evidence, which is better than none.

esseph · 2026-02-18T18:39:20 1771439960

> I always ask people, in the past year, how many AI-coded apps have you 1) downloaded 2) paid for?

In the past 5 years, the only "new" app I've added to my phone has been Claude.ai.

Before that I guess DoorDash. And that probably covers the past 7ish years of phone use.

There's just too much shit in the store, a lot of it is scammy or has dark patterns.

For me, "app stores" are largely dead.

_DeadFred_ · 2026-02-18T18:32:03 1771439523

Because so much technical functionality has been lost/paywalled/dark patterned/enshitified, I've cut the number of apps I use. I've realized building core personal functionality around the whims of corporations eventually just gets weaponized against me, so I might as well start undoing that on my own terms. Who in 2026 is really bringing in a new app/Saas to do much of anything like we naively did a decade ago? No one I know, we've been shown we will be treated as suckers for doing that.

RC_ITR · 2026-02-11T21:23:55 1770845035

The bird not having wings, but all of us calling it a 'solid bird' is one of the most telling examples of the AI expectations gap yet. We even see its own reasoning say it needs 'webbed feet' which are nowhere to be found in the image.

This pattern of considering 90% accuracy (like the level we've seemingly we've stalled out on for the MMLU and AIME) to be 'solved' is really concerning for me.

AGI has to be 100% right 100% of the time to be AGI and we aren't being tough enough on these systems in our evaluations. We're moving on to new and impressive tasks toward some imagined AGI goal without even trying to find out if we can make true Artificial Niche Intelligence.

zarzavat · 2026-02-12T10:15:07 1770891307

This test is so far beyond AGI. Try to spit out the SVG for a pelican riding a bicycle. You are only allowed to use a simple text editor. No deleting or moving the text cursor. You have 1 minute.

RC_ITR · 2026-02-12T23:49:12 1770940152

Sorry, is your definition of AGI "doing things worse than humans can do, but way faster?" because that's been true of computers for a long time.

pixl97 · 2026-02-13T14:14:22 1770992062

I mean for this particular benchmark, yes.

You'd have to put it in an agentic loop to perform corrections otherwise.

Rudybega · 2026-02-11T21:50:39 1770846639

MMLU performance caps out around 90% because there are tons of errors in the actual test set. There's a pretty solid post on it here: https://www.reddit.com/r/LocalLLaMA/comments/163x2wc/philip_...

As far as I can tell for AIME, pretty much every frontier model gets 100% https://llm-stats.com/benchmarks/aime-2025

RC_ITR · 2026-02-12T23:44:13 1770939853

Here's the score for new AIME's, where we know the answers aren't in training.

https://matharena.ai/?view=problem&comp=aime--aime_2026

As for MMLU, is your assertion that these AI labs are not correcting for errors in these exams and then self-reporting scores less than 100%?

As implied by the video, wouldn't it then take 1 intern a week max to fix those errors and allow any AI lab to become the first to consistently 100% the MMLU? I can guarantee Moonshot, DeepSeek, or Alibaba would be all over the opportunity to do just that if it were a real problem.

kingstnap · 2026-02-12T15:50:36 1770911436

The benchmarks are harder than you might imagine and contain more wrong answers and terrible questions than you would expect.

You don't need to take my word for it, try playing MMLU yourself.

https://d.erenrich.net/are-you-smarter-than-an-llm/index.htm...

Its not MMLU-Pro btw, which is considerably harder.

RC_ITR · 2026-02-12T23:50:21 1770940221

Sure and AGI will 100% it 100% of the time, even if it is hard.

hieudesu · 2026-02-14T13:35:48 1771076148

Your definition of AGI must be absurd

simonw · 2026-02-12T00:22:52 1770855772

It has a wing. Look at the code comments in the SVG!

RC_ITR · 2026-02-04T16:29:46 1770222586

I may not be AGI, but here's a $615 2 Queen bed hotel room for the dates he wants in exactly the location he wants (just not on Airbnb).

https://www.booking.com/Share-Wt9ksz

Maybe he really is tied to $600 as his absolute upper limit, but also seems like something a few years from AGI would think to check elsewhere.

RC_ITR · 2026-02-04T16:16:49 1770221809

Yeah, I've found AI 'miracle' use-cases like these are most obvious for wealthy people who stopped doing things for themselves at some point.

Typing 'Find me reservations at X restaurant' and getting unformatted text back is way worse than just going to OpenTable and seeing a UI that has been honed for decades.

If your old process was texting a human to do the same thing, I can see how Clawdbot seems like a revolution though.

Same goes for executives who vibecode in-house CRM/ERP/etc. tools.

We all learned the lesson that mass-market IT tools almost always outperform in-house, even with strong in-house development teams, but now that the executive is 'the creator,' there's significantly less scrutiny on things like compatibility and security.

There's plenty real about AI, particularly as it relates to coding and information retrieval, but I'm yet to see an agent actually do something that even remotely feels like the result of deep and savvy reasoning (the precursor to AGI) - including all the examples in this post.

zer00eyz · 2026-02-04T17:51:11 1770227471

> Typing 'Find me reservations at X restaurant' and getting unformatted text back is way worse than just going to OpenTable and seeing a UI that has been honed for decades.

Your conflating the example with the opportunity:

"Cancel Service XXX" where the service is riddled with dark patterns. Giving every one an "assistant" that can do this is a game changer. This is why a lot of people who aren't that deep in tech think open claw is interesting.

> We all learned the lesson that mass-market IT tools almost always outperform in-house

Do they? Because I know a lot of people who have (as an example) terrible setups with sales force that they have to use.

candiddevmike · 2026-02-04T16:54:50 1770224090

I feel bad for whoever gets an oncall page that some executive's vibe coded app stopped working and needs to be fixed ASAP.

linschn · 2026-02-04T21:36:58 1770241018

> We all learned the lesson that mass-market IT tools almost always outperform in-house,

Funny, I learned the exact opposite lesson. Almost all software suck, and a good way for it not to suck is to know where the developer is and go tell them their shit is broken, in person.

If you want a large scale example, one of the two main law enforcement agency in france spun off libreoffice into their own legal writing software. Developped by LEOs that can take up to two weeks a year to work on that. Awesome software. Would cost litterally millions if bought on the market.

RC_ITR · 2025-12-09T18:32:11 1765305131

Speaking of suboptimal writing, why call it a 'gay' love affair, when he was openly gay?

Matterless · 2025-12-09T19:37:47 1765309067

One of the most important details of Sacks's life which dogged him nearly to the end (and which is important to this NY piece), was a minimization by Sacks of his own sexuality. He was not "openly gay" at all.

BeetleB · 2025-12-18T21:42:01 1766094121

For most of his life, he was not openly gay.

RC_ITR · 2025-11-23T16:57:04 1763917024

One of the biggest problems frontier models will face going forward is how many tasks require expertise that cannot be achieved through Internet-scale pre-training.

Any reasonably informed person realizes that most AI start-ups looking to solve this are not trying to create their own pre-trained models from scratch (they will almost always lose to the hyperscale models).

A pragmatic person realizes that they're not fine-tuning/RL'ing existing models (that path has many technical dead ends).

So, a reasonably informed and pragmatic VC looks at the landscape, realizes they can't just put all their money into the hyperscale models (LP's don t want that) and they look for start-ups that take existing hyperscale models and expose them to data that wasn't in their pre-Training set, hopefully in a way that's useful to some users somewhere.

To a certain extent, this study is like saying that Internet start-ups in the 90's relied on HTML and weren't building their own custom browsers.

I'm not saying that this current generation of start-ups will be successful as Amazon and Google, but I just don't know what the counterfactual scenario is.

Skunkleton · 2025-11-23T17:29:25 1763918965

The question that isn't answered completely in the article is how useful are the pipelines for these startups? The article certainly implies that for at least some of these startups there very little value add in the wrapper.

bradfa · 2025-11-23T17:20:55 1763918455

Got any links to explanations of why fine tuning open models isn’t a productive solution? Besides renting the GPU time, what other downsides exist on today’s SOTA open models for doing this?

RC_ITR · 2025-11-25T20:59:20 1764104360

When the new pre-trained parameters come out in a new model generation, your old fine tuning doesn't apply to them.

RC_ITR · 2025-11-13T17:51:58 1763056318

I think the word "de-enshittify" is probably the least elegant piece of slang ever uttered.

I know linguistics is descriptive not prescriptive, but it's truly amazing to me the lengths people will go to swear.

dang · 2025-11-13T18:22:28 1763058148

https://news.ycombinator.com/item?id=45918211

Blame Doctorow for swearing, not me!

RC_ITR · 2025-09-09T15:19:13 1757431153

I think it's interesting that everyone's immediate reaction now-a-days is to assume incompetence or maliciousness, rather than curiosity at the root cause (very telling this attitude has even permeated a forum for supposed 'hackers').

A high-level is that 80% of the economy is very easy to track b/c it's not very volatile (teachers, for example).

What we have seen is a huge surge in unpredictability in the most volatile 20% of jobs (mining, manufacturing, retail, etc.). The BLS can't really change their methods to catch up with this change for classic backwards compatibility and tech debt reasons.

Part of the reason 'being a quant' is so hot right now is that we truly are in weird times where volatility is much higher than most people realize across sectors of the economy (i.e. AI is changing formerly rock-solid SWE employment trends, tariffs/electricity are quickly and randomly changing domestic manufacturing profitability, etc.). This means that if you can build systems that track data better than the old official systems, you can make some decent money investing against your knowledge.

I think this is a bad state of affairs, but I don't have a good solution. Any private company won't release their data b/c it's too valuable and I am reluctant to encourage the BLS to rip up their methods when backwards compatibility is a feature worth saving.

tonyedgecombe · 2025-09-09T16:02:23 1757433743

Is there really more volatility? My gut feeling is that government interventions have flattened it over recent decades. I’d like to see some real figures on this.

RC_ITR · 2025-09-09T18:58:03 1757444283

https://fred.stlouisfed.org/graph/?g=1Mc3z

Manufacturing and mining are becoming much less correlated to the overall jobs market (likely, as you point out, b/c the government smooths the other sectors).

https://fred.stlouisfed.org/graph/?g=1Mc3I

This is despite being a relatively flat % of employment since 2010 (after a long period of decline).

https://fred.stlouisfed.org/graph/?g=1Mc4f

As mentioned, there is also the weirdness of SWE's going from 'better than the overall market' to 'worse than the overall market'.

https://fred.stlouisfed.org/graph/?g=1Mcer

Retail employment is also dislocating.

Those are just the examples I can think of with no research, I'm sure there are others.

chrisco255 · 2025-09-09T18:50:01 1757443801

Can you actually prove volatility is higher now than in the past? There have been plenty of volatile changes in the workforce over the past several decades, this is not anything new to the job market.

logifail · 2025-09-09T18:08:44 1757441324

> interesting that everyone's immediate reaction now-a-days is to assume incompetence or maliciousness, rather than curiosity at the root cause

I came across this claim last week regarding recent US jobs figures:

> "All jobs gains were part time. Full-time jobs: -357K. Part-time jobs: +597K"

If this claim is true, and I have no means to tell if it is, then - regardless of one's view on whoever is in power right now - do we really expect any elected representatives to be brave enough to say that out loud at a press conference?

I don't :/

riazrizvi · 2025-09-09T17:04:27 1757437467

Explain to me please why job numbers aren’t simply a matter of querying the Federal social security database? A longstanding process of polling businesses for what they want to report, followed by corrections up to one year later, has got to be a pantomime to fudge the numbers.

tzs · 2025-09-09T18:04:23 1757441063

They survey businesses because the Social Security database has too much lag and does not contain enough detail.

The lag is because it is based on employer submissions that are quarterly or annual.

riazrizvi · 2025-09-09T18:37:54 1757443074

Does that pass the basic common sense smell test? Everyone can see on their paycheck the amount, that is paid 30 days after any work day in the worst case. These payments are sent to a single federal bank account, and data-wise are combined with Social Security ID, sending bank id, date. It’s a bank, there’s a database. We are talking at most about 200mm records, a raspberry pi can process that query in minutes. If we can’t query this easily it’s by design. Or we could do some backflips and somersaults to try to come up with a reason for why the bureaucracy has to be more complicated.

tzs · 2025-09-09T20:09:51 1757448591

The payments are deposited monthly or semiweekly (for employers with large payroll) but that's a lump sum. If you are looking at that from the government side all you can tell is whether total payroll has gone up or down. That won't tell if any change is due to a change in number of employees or a change in pay rates or some combination of that.

It isn't until the employer files their quarterly Form 941 that you'd see employment numbers. Form 941 includes the number of employees and total wages and withholding.

It isn't until the annual W-2 filings that you would see a breakdown that includes number of employees and the individual pay.

riazrizvi · 2025-09-09T20:43:27 1757450607

Ah okay, this is why then. So all my other comments complaining about the lack of timeliness have this simple explanation. TIL

RC_ITR · 2025-09-09T19:09:35 1757444975

Not all 'normal income' is from a "job" as we think of it and assuming that does not even come close to passing any informed person's smell test.

Parsing tax or SS payments for what a "job" is would be a logistical nightmare, because that's not what the system is designed for (unlike the BLS's system, which is designed to count jobs).

riazrizvi · 2025-09-09T19:13:05 1757445185

When ppl want job numbers they want a reliable proxy for the state of the economy. Fixing it on changes to payroll-based social security payments would be far better than what we have now, if timely.

RC_ITR · 2025-09-09T19:16:07 1757445367

Sure, that's personal income and can be found here:

https://fred.stlouisfed.org/series/PINCOME

riazrizvi · 2025-09-09T20:08:27 1757448507

I only see a stat that reports the same number for full employment vs one person who fired them all and took their incomes. Is there a way to disaggregate to get some proxy for employment like we are talking about?

RC_ITR · 2025-09-09T20:33:18 1757449998

Yes, the BLS employment survey.

mannyv · 2025-09-10T16:27:13 1757521633

And yet, SS contributions are done every pay period.

Who has that data then? Treasury?

riazrizvi · 2025-09-09T20:58:03 1757451483

So the answer is payments per social security id are not reported to the social security Electronic Federal Tax Payment System (EFTPS), employers only report aggregate payments. And workers and employers only report payments by individual in W2’s in January.

chrisco255 · 2025-09-09T18:54:02 1757444042

Probably the only reason is because the BLS and SSA are completely separate, and SSA is probably antiquated and doesn't attempt to tag or organize their data along the same parameters as whatever the BLS defines. It likely neither has the staffing nor resources to provide those hooks and realtime anonymous aggregated data for other departments to consume.

SantalBlush · 2025-09-10T00:57:39 1757465859

A lot of people don't understand that collecting data is actually expensive and difficult when it doesn't involve surreptitiously stealing it via some piece of tech.

gdulli · 2025-09-09T15:28:10 1757431690

Are 'hackers' allowed to have priors regarding incompetence or malice, or are we supposed to look at everything with a clean slate and no context?

RC_ITR · 2025-08-12T15:13:15 1755011595

Meta is also a great example of AI leading to higher user engagement today.

Reels isn't powered by Transformers per se (likely more of a complex mix of ML techniques), but it is powered by honest-to-goodness SOTA AI/ML running on leading-edge Nvidia GPUs.

I think, because they're so impressive, people assume Transformers = AI/ML, when there's plenty of other hyperscale AI/ML products on the market today.

RC_ITR · 2025-07-22T17:02:50 1753203770

I also wonder if just that microglia are activated during chemo. Maybe this is just the most useful case of 'chemo brain' ever.

pitpatagain · 2025-07-22T17:11:35 1753204295

The article is mostly about how there are now recognized to be certain schizophrenia-like conditions that are clearly autoimmune diseases. Mentioned in the article are anti-NMDA-receptor encephalitis, which responds to immunotherapy, and a previously published case of a woman mid-diagnosed with catatonic schizophrenia fully recovering after being treated for lupus with immunosuppressive therapy.

Based on this, the article suggests that the rituximab Mary was given along with chemo was the key. However, they were unable to test conclusively for antibody evidence of this theory after the fact.

mgh95 · 2025-07-22T17:28:24 1753205304

I have a family member with an incidence of autoimmune encephalitis secondary to other conditions (my entire family is an autoimmune cluster) who is actually hospitalized for it now. This almost matches my experience to a tee, though anti-NMDAR was tested for and not found. The neurologists wanted to discharge prior to attempting immunotherapy and thankfully we were able to ensure they tried (pulse steroids).

It's certainly an area which can be characterized as rare disease, whether paraneoplastic or otherwise.

hinkley · 2025-07-22T17:08:37 1753204117

Probably why we keep looking at electroconvulsive ‘therapy’ again and again. Triggering the body’s systems to do something often cleans up other situations at the same time.

There was a phenomenon where sometimes a high fever would cure STDs like syphilis. We generally use antibiotics now that we have them, because they are less dangerous.

peterfirefly · 2025-07-23T17:38:57 1753292337

> would cure STDs like syphilis.

And sometimes certain cancers.