People like him and Zitron do serve a useful purpose in balancing the hype from the other side, which, while justified to a great extent, is often a bit too overwhelming.
I do think Gary Marcus says a lot of wrong stuff about LLMs but I don’t see anything too egregious in that post. He’s just describing the results they got a few months ago.
Technical strengths aside, I’ve been impressed with how non-robotic Kimi K2 is. Its personality is closer to Anthropic’s best: pleasant, sharp, and eloquent. A small victory over botslop prose.
I have a different experience in chatting/creative writing. It tends to overuse certain speech patterns without repeating them verbatim, and is strikingly close to the original R1 writing, without being "chaotic" like R1 - unexpected and overly dramatic sci-fi and horror story turns, "somewhere, X happens" at the end etc.
Interestingly enough, EQ-Bench/Creative Writing Bench doesn't spot this despite clearly having it in their samples. This makes me trust it even less.
We may certainly hope Eliezer's other predictions don't prove so well-calibrated.