This is amazing! I love that it requires very fancy hardware that is well designed. It's good someone finally made a chess game appropriate for the tiktok generation.
Really excited about this! Congrats on the launch. Ships make sense as a first target, but I'm curious -- do you see a future in which we have household fission reactors? E.G. power an entire house (city block, etc...) with fission reactors?
Thank you! Household fission reactors: my take is that from a technical perspective we could definitely do it. It's more from a proliferation and nuclear waste perspective, will it be allowed and accepted by the public? Not sure, maybe though.
If that's a concern, how do you solve that for shipping? What if some somali pirate steals your fusion ship? Would they have to have armed protection (on top of the guards they already have, that's probably not enough when nuclear proliferation is the issue)?
With fusion, there is no Uranium or Plutonium or highly radioactive materials. The main concern is Tritium which is a categorically reduced concern from enriched Uranium (but still needs to be secured and accounted for).
The argument that I've heard is that roof installed solar is incredibly expensive compared to all other solar. Add in the other compromises with orientation and obstructed sunlight, and you quickly realize that it is likely better to install solar and batteries at dedicated power facilities that scale better than to distribute the infrastructure in residential neighborhoods.
We’ve just learned that it’s possible to do AI on less compute (deepseek). if OpenAI doesn’t scale and that’s the problem then I’d argue that in the long run, if you believe in their ability to do research, then the news this week is a very bullish sign.
IMO the equivalent of moores law for AI (both on software and hardware development) is baked into the price, which doesn’t make the valuation all too crazy.
> We’ve just learned that it’s possible to do AI on less compute (deepseek).
There's a huge motte and bailey thing with DeepSeek conversation, where the bailey is "It only took $5.5 million!*" (* for exactly one training run for one of several models, at dirt-cheap per-hour spot prices for H100s) and the motte is all sorts of stuff.
Truth is one run for one model took 2048 GPUs fulltime for 2 months, and my experience with FAANG ML, that means it took 6 months part-time and another 1.5-2.5 runs went absolutely nowhere.
> is baked into the price, which doesn’t make the valuation all too crazy.
Valuations for most large companies have been crazy for a while now. No one values a company based on fundamentals anymore, its all pure gambling on future predictions.
This isn't unique to OpenAI by any means, but they are a good example. Last I checked their revenue to valuation multiplier was in the range of 42X. That's crazy.
- fp8 instead of fp32 precision training = 75% less memory
- multi-token prediction to vastly speed up token output
- Mixture of Experts (MoE) so that inference only uses parts of the model not the - entire model (~37B active at a time, not the entire 671B), increases efficiency
- PTX (basically low-level assembly code) hacking in old Nvidia GPUs to pump out as much performance from their old H800 GPUs as possible
Then, the big innovation of R1 and R1-Zero was finding a way to utilize reinforcement learning within their LLM training.
They also use some kind of factorized attention that somehow leads to compression of tokens (I still haven't read their papers, so I can't be clearer than this).
Honestly, I’m not sure I’m completely sold on the value of LLMs long term but this is the most realistic and reasonable take I’ve read on this post so far.
If anything, it’s an downward adjustment in the cost implications but could actually unlock exponential improvements on a shorter time horizon than expected because of that. Investors getting scared probably is a good opportunity to buy in.
Is there an acronym for edge/local/offline? ELO could be confused with something AI already dominates at. As someone working in the edge/local/offline space it’s interesting to hear these together though. Offline is local but local often isn’t offline :)
It's always been possible to "do (worse) AI on less compute". We've had years of open models! I also don't understand how anyone can see this as anything but good news for OpenAI. The ultimate value proposition of AI has always depended on whether it stretches to AGI and beyond, and R1 demonstrates that there's several orders of magnitude of hardware overhang. This makes it easier for OpenAI to succeed, not harder, because it makes it less likely that they'll scale to their financial limits and still fail to surpass humans.
The point is that this was developed outside of OpenAI.
So the real question is why does anyone believe that OpenAI will bring AGI when actual innovation was happening in some hedge fund in China while OpenAI was going on an international tour trying to drum up a trillion dollars.
Okay, that argument makes no sense to me. I thought the whole point of VC is that money is cheaper than time to market? So OpenAI didn't microoptimize their training code, sure, but they didn't need to. All the innovation of R1 is that they managed to match OpenAI's tech demo from like a year ago using considerably worse hardware by microoptimizing the hell out of it. And that's cool, full credit to them, it's a mighty impressive model. But they did it like that because they had to. It's very impressive given their constraints, but it doesn't actually advance the field.
The interesting part is that distillations based on reinforcement learning based models are performing so well. That brings the cost down dramatically to do certain tasks.
This is one of those fun reads because it unifies quite a few things that I’ve read about or been interested in recently — Hilbert curves for geospatial indexing in dbs, Gray codes, and fractals! And it’s all fairly intuitive — the 1-bit shift makes sense for space traversal and makes the numbers curve pattern easier to reason about.
It's not always the easiest to follow (we often have disagreements about whether something is a tutorial or a how-to), but it's a really valuable framing and I think our docs have gotten better because of it.
Heh, this was very much the design philosophy behind Hamilton (github.com/dagworks-inc/hamilton).
The basic idea was that if you have a data artifact (columns for dataframes initially), you should be able to ctrl-f and find it in your codebase. 1:1 mapping of data -> function.
People take a long time to figure out that the readability gains from having greppability is worth whatever verbosity that comes, largely because they think of code too much as a craft (make it as small/neat as possible) and not documentation for a live process...