exegeist's comments

exegeist · 2025-07-19T12:29:01 1752928141

Impressive prediction, especially pre-ChatGPT. Compare to Gary Marcus 3 months ago: https://garymarcus.substack.com/p/reports-of-llms-mastering-...

We may certainly hope Eliezer's other predictions don't prove so well-calibrated.

rafaelero · 2025-07-19T13:28:57 1752931737

Gary Marcus is so systematically and overconfidently wrong that I wonder why we keep talking about this clown.

qoez · 2025-07-19T14:20:06 1752934806

People just give attention to people making surprising bold counter narrative predictions but don't give them any attention when they're wrong.

keeda · 2025-07-19T18:03:13 1752948193

People like him and Zitron do serve a useful purpose in balancing the hype from the other side, which, while justified to a great extent, is often a bit too overwhelming.

Philpax · 2025-07-19T18:06:31 1752948391

Being wrong in the other direction doesn't mean you've found a great balance, it just means you've found a new way to be wrong.

causal · 2025-07-19T13:45:10 1752932710

These numbers feel kind of meaningless without any work showing how he got to 16%

dcre · 2025-07-19T13:44:45 1752932685

I do think Gary Marcus says a lot of wrong stuff about LLMs but I don’t see anything too egregious in that post. He’s just describing the results they got a few months ago.

m3kw9 · 2025-07-19T13:59:04 1752933544

He definitely cannot use the original arguments from then ChatGPT arrived, he's a perennial goal post shifter.

shuckles · 2025-07-19T14:18:30 1752934710

My understanding is that Eliezer more or less thinks it's over for humans.

0xDEAFBEAD · 2025-07-19T14:44:08 1752936248

He hasn't given up though: https://xcancel.com/ESYudkowsky/status/1922710969785917691#m

exegeist · 2025-07-12T22:02:31 1752357751

Technical strengths aside, I’ve been impressed with how non-robotic Kimi K2 is. Its personality is closer to Anthropic’s best: pleasant, sharp, and eloquent. A small victory over botslop prose.

orbital-decay · 2025-07-13T11:57:28 1752407848

I have a different experience in chatting/creative writing. It tends to overuse certain speech patterns without repeating them verbatim, and is strikingly close to the original R1 writing, without being "chaotic" like R1 - unexpected and overly dramatic sci-fi and horror story turns, "somewhere, X happens" at the end etc.

Interestingly enough, EQ-Bench/Creative Writing Bench doesn't spot this despite clearly having it in their samples. This makes me trust it even less.