More

tmule · 2026-03-05T03:22:17 1772680937

Oppenheimer? Really? Quoting a review of an Oppenheimer biography:

“Oppenheimer was clearly an enormously charming man, but also a manipulative man and one who made enemies he need not have made. The really horrible things Oppenheimer did as a young man – placing a poisoned apple on the desk of his advisor at Cambridge, attempting to strangle his best friend – and yes, he really did those things – Monk passes off as the result of temporary insanity, a profound but passing psychological disturbance. (There’s no real attempt by Monk to explain Oppenheimer’s attempt to get Linus Pauling’s wife Ava to run off to Mexico with him, which ended the possibility of collaboration with one of the greatest scientists of the twentieth, or any, century.) Certainly the youthful Oppenheimer did go through a period of serious mental illness; but the desire to get his own way, and feelings of enormous frustration with people who prevented him from getting his own way, seem to have been part of his character throughout his life.”

Seems more like Sam Altman, who is known to get his way, than Dario.

toraway · 2026-03-05T06:29:57 1772692197

The source for the poisoned apple story is Oppenheimer himself, and otherwise uncorroborated to be clear. He spent his life clearly racked by feelings of inadequacy, guilt and self-doubt.

When combined with a somewhat paradoxical large ego and occasionally fanciful reshaping of his own life story or exaggeration, it's entirely plausible (if not likely) that this was in reality a brief intrusive thought or a partially realized fantasy blown up into a catchy anecdote that better fit his self-image of being unable to control his typically human qualities of anger and envy.

If it was Sam Altman, we'd have heard the story from the guy he tried to poison, who instead of filing a police report thought it showed Sam was a real go-getter and offered him his first job on the spot as VP at the company he founded (later forced out by Sam replacing him as CEO, but still considers him a friend with no hard feelings).

CamperBob2 · 2026-03-05T05:36:09 1772688969

The idea isn't that Oppenheimer was a saint, but that the government he served well and faithfully -- at the expense of his soul, some would argue -- turned on him viciously as soon as he dared to question their agenda.

As you suggest, it is easy to imagine Altman in the same hot seat. Never mind his sexual orientation, which the Republican theocrats will eventually use against him as surely as the knives came out for Ernst Röhm.

msabalau · 2026-03-05T16:08:55 1772726935

It's a bit simplistic to personify complex organizations of millions of people like "The Government" or "The Market" as if they were a living, breathing persons with a single mind.

There were people working in government who successfully attacked Oppenheimer for personal and/or policy reasons, people who stood by, and people who unsuccessfully supported him, voted to clear him, or condemned the proceedings.

Oppenheimer still paid the price, and arguably, the risks to someone like him today are considerably higher, as the current administration isn't exactly like Eisnehower's.

Nevertheless it's reductionist, reifying sentimentality to talk about "the government" turning "viciously" on someone who "served them well" because they are defying its agenda. The government isn't a character in Game of Thrones. The responsibility lies with the specific individuals who attacked him, and those who stood by.

CamperBob2 · 2026-03-05T18:31:56 1772735516

Nevertheless it's reductionist, reifying sentimentality to talk about "the government" turning "viciously" on someone who "served them well" because they are defying its agenda. ... The responsibility lies with the specific individuals who attacked him, and those who stood by.

I'm sure that was of great comfort to Oppenheimer, as it will be to Altman and/or Amodei. "It's not you, it's us."

tmule · 2026-01-21T05:09:59 1768972199

Your comments history suggests you’re rather bitter about “nerds” who are likely a few standard deviations smarter than you (Anthropic OG team, Jeff Dean, proof nerds, Linus, …)

jackblemming · 2026-01-21T05:19:58 1768972798

And they’re all dumber than John von Neumann, who cares?

margalabargala · 2026-01-21T05:53:39 1768974819

Transitively, you haven't thought the most thoughts or cared the most about anything, therefore we should disregard what you think and care about?

jackblemming · 2026-01-21T05:59:01 1768975141

The person replying was trying to turn the conversation into some sort of IQ pissing contest. Not sure why, that seems like their own problem. I was reminding them that there is always someone smarter.

wiseowise · 2026-01-21T07:43:57 1768981437

Your comment history is littered with “nerds”, “elite”, “better” and all sorts of comparisons.

> I was reminding them that there is always someone smarter.

And even with this comment you literally do not understand that you have some skewed view of the world. Do you have some high school trauma?

efilife · 2026-01-21T07:46:33 1768981593

> Do you have some high school trauma?

I am not sure ad personam is appropriate here

wiseowise · 2026-01-21T07:56:39 1768982199

This is a thread about their personality.

https://news.ycombinator.com/item?id=46701378

jackblemming · 2026-01-21T09:02:14 1768986134

Where I come from, nerd is a term of endearment buddy.

> And even with this comment you literally do not understand that you have some skewed view of the world.

I’m well aware I don’t have a perfect view of reality and the map isn’t the territory. Do you?

wiseowise · 2026-01-21T19:20:11 1769023211

My bad. I jumped on incorrect conclusion. Sorry.

tmule · 2026-01-01T04:32:01 1767241921

It is highly unusual for someone to stay put after their net worth increases tenfold. Normally, you would expect an individual to seek out more elite social circles and embrace a significantly more opulent lifestyle. Not having that isn’t a sign of laziness (one can be certain that someone like Warren Buffett lives exactly as he chooses) but rather a reflection of the rare ability to decide that what he has is already enough.

tmule · 2025-11-29T16:38:32 1764434312

Why? I often feed an entire document I hastily wrote into an AI and prompt it to restructure and rewrite it. I think that’s a common pattern.

conartist6 · 2025-11-29T16:41:54 1764434514

It might be, but I really doubt those were the documents flagged as fully AI generated. If it erased all the originality you had put into that work and made it completely bland and regressed-to-the-mean, I would hope that you would notice.

tmule · 2025-11-29T18:08:51 1764439731

My objective function isn’t to maximize the originality of presentation - it’s to preserve the originality of thought and maximize interpretability. Prompting well can solve for that.

exe34 · 2025-11-29T16:56:26 1764435386

> I would hope that you would notice.

he didn't say he read it carefully after running it through the slop machine.

tmule · 2025-11-07T05:45:35 1762494335

China’s breakneck development is difficult for many in the US to grasp (root causes - baselining on sluggish domestic growth, and possessing a condescending view of China). This article offers a far more accurate picture than of how China is doing right now: https://archive.is/wZes6

eddyg · 2025-11-07T13:53:18 1762523598

Eye-opening summary... I knew China was ahead, but wow. Thanks for sharing that article.

frays · 2025-11-08T02:15:36 1762568136

Thank you for sharing this article. Eye opening.

tmule · 2025-09-29T02:17:42 1759112262

“ The author Andrew Gelman created a whole new branch of Bayesian statistics ...” Love Gelman, but this is playing fast and loose with facts.

kragen · 2025-09-29T02:28:27 1759112907

His book on hierarchical modeling with Hill has 20398 cites on Google Scholar https://scholar.google.com/scholar?cluster=94492350364273118... and Wikipedia calls him "a major contributor to statistical philosophy and methods especially in Bayesian statistics[6] and hierarchical models.[7]", which sounds like the claim is more true than false.

nextos · 2025-09-29T02:45:33 1759113933

He co-wrote the reference textbook on the topic and made interesting methodological contributions, but Gelman acknowledges other people as creators of the theoretical underpinnings of multilevel/hierarchical modeling, including Stein or Donoho [1]. The field is quite old, one can find hierarchical models in articles that were published many decades ago.

Also, IMHO, his best work has been done describing how to do statistics. He has written somewhere I cannot find now that he sees himself as a user of mathematics, not as a creator of new theories. His book Regression and Other Stories is elementary but exceptionally well written. He describes how great Bayesian statisticians think and work, and this is invaluable.

He is updating Data Analysis Using Regression and Multilevel/Hierarchical Models to the same standard, and I guess BDA will eventually come next. As part of the refresh, I imagine everything will be ported to Stan. Interestingly, Bob Carpenter and others working on Stan are now pursuing ideas on variational inference to scale things further.

[1] https://sites.stat.columbia.edu/gelman/research/unpublished/...

kianN · 2025-09-29T03:57:28 1759118248

Totally agree and great point that hierarchical models have been around for a long time; however, these were primarily analytical, leveraging conjugate priors or requiring pretty extensive integration.

I would say his work with Stan and his writings, along with theorists like Radford Neal, really opened the door to a computational approach to hierarchical modeling. And I think this is a meaningfully different field.

CrazyStat · 2025-09-29T12:34:27 1759149267

I give Gelman a lot of credit for popularizing hierarchical models, but you give him too much.

Before Stan existed we used BUGS [1] and then JAGS [2]. And most of the work on computation (by Neal and others) was entirely independent of Gelman.

[1] https://en.wikipedia.org/wiki/Bayesian_inference_using_Gibbs...

[2] https://en.wikipedia.org/wiki/Just_another_Gibbs_sampler

tmule · 2025-09-13T01:08:38 1757725718

This is a remarkable claim. Not a single Indian in tech that I know in my personal or professional life - numbering over a hundred - has ever disputed that Indians have strong (sub)ethnic affinities that color their views hiring. In addition, nepotism is a real thing in Indian culture. I’d be laughed out of a room with aforesaid folks if I claimed “Indian managers have a tendency to hire anyone else but Indians”. This is either deliberately misleading to “save face” on behalf of the community (another cultural trait), or you’re utterly oblivious in an outlying way to how things work.

srameshc · 2025-09-13T12:55:07 1757768107

> Not a single Indian in tech that I know in my personal or professional life

Your dataset is very small. I come from India

tmule · 2025-09-13T19:50:51 1757793051

Yeah, Sherlock, where do you think I come from if I know upward of 100 Indians well enough to discuss ethnic nepotism with?

tmule · 2025-08-14T16:05:23 1755187523

Unfortunately, as things stand, it’s well-known that behaviors and optimizations in small scale models fail to replicate in larger models.

yorwba · 2025-08-14T19:01:57 1755198117

Doing hyperparameter sweeps on lots of small models to find the optimal values for each size and fitting scaling laws to predict the hyperparameters to use for larger models seems to work reasonably well. I think https://arxiv.org/abs/2505.01618 is the latest advance in that vein.

swyx · 2025-08-14T21:22:44 1755206564

the problem is that the eval processes dont really work here if you believe in "Emergent Abilities" https://arxiv.org/abs/2206.07682

exasperaited · 2025-08-14T22:06:13 1755209173

Which we probably should not, at least not the "sudden" emergence that those researchers claimed to see.

https://arxiv.org/abs/2304.15004

Good article about why here; this helped me understand a lot:

https://www.wired.com/story/how-quickly-do-large-language-mo...

jychang · 2025-08-14T23:00:27 1755212427

Why not? It takes models of a certain size to contain xyz neuron/feature.

https://www.youtube.com/watch?v=AgkfIQ4IGaM

That's not a mirage, it's clearly capability that a smaller model cannot demonstrate. A model with less parameters and less hidden layers cannot have a neuron that lights up when it detects a face.

yorwba · 2025-08-15T07:42:19 1755243739

Consider a single-neuron model that just pools all pixels in an image together. It's possible for the average activation of this neuron to be exactly the same on faces and non-faces, but extremely unlikely given the large range of possibilities. So in aggregate, this neuron can distinguish faces from non-faces, even though, when you apply it to classifying a particular image, it'll be better than random only by an extremely tiny amount.

As the number of neurons increases, the best face/non-face distinguisher neuron gets better and better, but there's never a size where the model cannot recognize faces at all and then you add just a single neuron that recognizes them perfectly.

jychang · 2025-08-16T06:34:05 1755326045

> here's never a size where the model cannot recognize faces at all

True

> then you add just a single neuron that recognizes them perfectly

Not true.

Don't think in terms of neurons, think in terms of features. A feature can be spread out over multiple neurons (polysemanticity), I just use a single neuron as a simplified example. But if those multiple neurons perfectly describe the feature, then all of them are important to describe the feature.

The Universal Approximation Theorem implies that a large enough network to perfectly achieve that goal would exist (let's call it size n or larger), so eventually you'd get what you want between 0 and n neurons.

yorwba · 2025-08-16T08:33:43 1755333223

> if those multiple neurons perfectly describe the feature, then all of them are important to describe the feature.

You could remove any one of those neurons before retraining the model from scratch and polysemanticity would slightly increase while perfomance slightly decreases, but really only slightly. There are no hard size thresholds, just a spectrum of more or less accurate approximations.

victorbjorklund · 2025-08-14T17:16:36 1755191796

Which in itself is very interesting and requires study.

anvuong · 2025-08-14T19:16:02 1755198962

It mostly has to do with sparsity in high dimensional space. When you scale things to the extreme everything is very far away from each other, the space is sparse, and random vectors have very high chance to be orthogonal, etc. All of these makes optimization incredibly slow and difficult. Just another facet of the so called "curse of dimensionality".

jebarker · 2025-08-14T16:43:01 1755189781

Well-known but not well-understood

jph00 · 2025-08-14T20:51:27 1755204687

That's not widely true. E.g the GPT 4 tech report pointed out nearly all their experiments were done on models 1000x smaller than the final model.

tmule · 2025-08-15T00:46:12 1755218772

Fair point, though I’d argue that there’s inherent selection bias for improvements that could fit a scaling law curve in the small model regime here.

indoordin0saur · 2025-08-14T17:16:40 1755191800

But why? If we don't know why then how do we figure it out?

tmule · 2025-07-19T19:07:35 1752952055

Discussions about Indian politics or the Indian psyche—especially when laced with Indic supremacist undertones—are off-topic and an annoyance here. Please consider sharing these views in a forum focused on Indian affairs, where they’re more likely to find the traction they deserve.

archon1410 · 2025-07-19T19:44:44 1752954284

It is not "supremacist" to believe that depriving hundreds of millions of people from higher education in their native language is deeply unjust. This reflection was prompted by a comment on why Indian languages are not represented in international competitions, which was prompted by a comment on the competition being available in many languages.

Discussions online have a tendency to go off into tangents like this. It's regrettable that this is such a contentious topic.

tmule · 2025-07-19T22:11:45 1752963105

> self-loathing elites in India

Your disdain for English-speaking Indian elites (pejoratively referred to as ‘Macaulayites’ by Modi’s supporters) is quite telling. That said, as I mentioned earlier, this kind of discourse doesn’t belong here.

archon1410 · 2025-07-19T22:29:27 1752964167

My disdain is for the fact that hundreds of millions of Indians cannot access higher education in their native language, and instead of simply learning a foreign language as a subject like the rest of world, they have the bear the burden[1] of learning things in a foreign language which they have to simultaneously learn. I have disdain for the people responsible for this mess. I do not have any disdain for any language-speaking class, specially not one which I might be part of.

[1]https://www.mdpi.com/2071-1050/14/4/2168

sealeck · 2025-07-19T20:52:39 1752958359

Much more efficient for us to all speak the same language. Trying to create fragmentation is inefficient.

archon1410 · 2025-07-19T21:07:49 1752959269

You should take that up with the IMO then, or all of European Union. They provide services in ~two dozen languages.

sealeck · 2025-07-19T21:32:19 1752960739

Sure, but why worsen the situation by using more languages?

queenkjuul · 2025-07-20T02:26:40 1752978400

Human culture should not be particularly concerned with efficiency

tmule · on Feb 11, 2022

It’s unclear what is being praised- is it the high working memory (and reasoning power) of those great men or the ability to have an open discussion about the merits of a case?

30minAdayHN · on Feb 11, 2022

i think he was praising their ability to not bring in their egos to the discussion while trusting that best idea will win. best idea won without lot of inefficiency of repetition which often happens.

_delibash_ · on Feb 11, 2022

It looks like it's both. But either way, it's not that extraordinary.

coldtea · on Feb 11, 2022

What's not extraordinary? "the ability to have an open discussion about the merits of a case"? You'd be surprised...

quickthrower2 · on Feb 11, 2022

The memory one could be, depending on how much detail and new ground is covered. Imagine your first day as a developer and you hear their architecture for the first time, and you recall all the points if a discussion between 5 expert people there. You remember the words and build the mental model simultaneously. It would be impressive, unless your full time job is consulting in such meetings.

tmule · on Feb 11, 2022

“ It would be impressive, unless your full time job is consulting in such meetings.”

There’s self section of individuals with high working memory into such roles. There are many managers who attend meetings all day and can’t synthesize what’s being discussed in real time, indicating that this isn’t about practice.