Very different projects so I would not encourage a merge but sharing a code base? I can totally see that being a boon for both and other Windows emulation projects.
We haven't had phones running laptop-grade CPUs/GPUs for that long, and that is a very real hardware feat. Likewise, nobody would've said running a 400b LLM on a low-end laptop was feasible, and that is very much a software triumph.
We've had solid CPUs for a while, but GPUs have lagged behind (and they're the ones that matter for this particular application). iPhones still lead by a comfortable margin on this front, but have historically been pretty limited on the IO front (only supported USB2 speeds until recently).
The GPUs are perfectly solid. Cheap Android handsets have shipped with Vulkan compliance for almost a decade now; the GPUs are equally-featured to consoles and PCs. The same goes for Apple handsets that run byte-identical Metal Compute Shaders to the Mac. For desktop use they are perfectly amenable. The hardware lacks nothing required for inference or gaming that dGPUs ordinarily support.
And even if you raise the requirements, we still have to contend with cheap CUDA-capable GPUs like the one in the ($300!!!) Nintendo Switch, or the Jetson SOCs. The mobile market has had tons of high-speed/low-power options for a very long time now.
They specifically state that they’re aiming for a “fatter” model that expects higher-end hardware, and other projects like Internet in a box already target rpi-style devices.
I think there are technically some 3 bit byteshape quants that are aimed specifically at running up to 30B MoEs on the 16GB Pi 5, so it would be possible to do something reasonably fat at very low speeds and extremely short contexts (like 4k maybe). One of those 32 or 64GB Rockchip based boards would do better, but there's rarely usable software to go along with them.
An industrial grade Jetson Thor would probably be the ultimate platform for this if you ignore the money part.
There's an important property that emerges from rules 3 and 4 — because the simple algorithm is easier to implement correctly, you can test the fancy algorithm for correctness by comparing its output to the simple one.
But doesn't the Apple M series NPU support FP8, and as it's a monolithic die (except for the GPU in the M5 Pro and Max) it could be argued it has hardware FP8 support, no?
By that logic, on the M4 (which still has the GPU on the same die as the CPU), CPU cores have hardware accelerated raytracing, which is obviously nonsense.
Apple's hardware does not support FP8 (neither the ANE NPU, or the new "neural accelerator" tensor cores), though the most recent variant supports INT8.
> It is as related to "Agentic $whatever" as your toaster is related to it
These things have hardware FP8 support, and a 1.8TB/s full mesh interconnect between CPUs and GPUs. We can argue about the "agentic" bit, but those are features that don't really matter for any workload other than AI.
mem bw between cores matters for .... literally all workloads that are not single-core (read: all). And FP8 matters not at all cause inference on cpu is too slow to be of any use whatsoever in the days of proper accelerators
Don't think they would. Games aren't nearly as hungry for memory bandwidth as LLMs are. Also, I expect that the VRAM/GPU/CPU balance would be completely out of whack. Something would be twiddling its thumbs waiting for the rest of the hardware.
We all know that 1/3 + 1/3 + 1/3 = 1, but 0.33 + 0.33 + 0.33 = 0.99. We're sufficiently used to decimal to know that 1/3 doesn't have a finite decimal representation. Decimal 1/10 doesn't have a finite binary representation, for the exact same reason that 1/3 doesn't have one in decimal — 3 is co-prime with 10, and 5 is co-prime with 2.
The only leaky abstraction here is our bias towards decimal. (Fun fact: "base 10" is meaningless, because every base calls itself base 10)
It's not rocket science. Those particular database schemas, together with those particular CRUD layers, do something useful, and neither building nor maintaining those applications is part of the core business for most companies, so buying prebuilt from somebody else, and letting them maintain it for you, makes perfect business sense.
DST shenanigans aside (we're in the "US has changed but Europe hasn't" window), 10:00 in SF is 18:00 in London. Meaning their peak time window is 13:00–19:00 London time, or 14:00–20:00 Berlin time.
So us European folks get promotional rates during the morning and evening.
EDIT: Actually, because the promo ends at the end of March, it'll all be within DST shenanigans. So peak times are 12:00–18:00 London, 13:00–19:00 Berlin.
reply