More

pdpi · 2026-03-24T20:34:34 1774384474

And vice-versa. It's pretty interesting that the two projects haven't kind of merged despite all the collaboration.

MisterTea · 2026-03-24T21:03:22 1774386202

Very different projects so I would not encourage a merge but sharing a code base? I can totally see that being a boon for both and other Windows emulation projects.

pdpi · 2026-03-23T15:12:00 1774278720

It's both.

We haven't had phones running laptop-grade CPUs/GPUs for that long, and that is a very real hardware feat. Likewise, nobody would've said running a 400b LLM on a low-end laptop was feasible, and that is very much a software triumph.

bigyabai · 2026-03-23T16:27:12 1774283232

> We haven't had phones running laptop-grade CPUs/GPUs for that long

Agree to disagree, we've had laptop-grade smartphone hardware for longer than we've had LLMs.

pdpi · 2026-03-23T17:29:42 1774286982

Kind of.

We've had solid CPUs for a while, but GPUs have lagged behind (and they're the ones that matter for this particular application). iPhones still lead by a comfortable margin on this front, but have historically been pretty limited on the IO front (only supported USB2 speeds until recently).

bigyabai · 2026-03-23T23:03:12 1774306992

The GPUs are perfectly solid. Cheap Android handsets have shipped with Vulkan compliance for almost a decade now; the GPUs are equally-featured to consoles and PCs. The same goes for Apple handsets that run byte-identical Metal Compute Shaders to the Mac. For desktop use they are perfectly amenable. The hardware lacks nothing required for inference or gaming that dGPUs ordinarily support.

And even if you raise the requirements, we still have to contend with cheap CUDA-capable GPUs like the one in the ($300!!!) Nintendo Switch, or the Jetson SOCs. The mobile market has had tons of high-speed/low-power options for a very long time now.

mnkyprskbd · 2026-03-23T23:32:51 1774308771

We had LLMs for about 5 minutes or so. Hardly a measure of time for an industry that goes back half a century and then some.

pdpi · 2026-03-22T16:53:45 1774198425

They specifically state that they’re aiming for a “fatter” model that expects higher-end hardware, and other projects like Internet in a box already target rpi-style devices.

moffkalast · 2026-03-22T21:02:34 1774213354

I think there are technically some 3 bit byteshape quants that are aimed specifically at running up to 30B MoEs on the 16GB Pi 5, so it would be possible to do something reasonably fat at very low speeds and extremely short contexts (like 4k maybe). One of those 32 or 64GB Rockchip based boards would do better, but there's rarely usable software to go along with them.

An industrial grade Jetson Thor would probably be the ultimate platform for this if you ignore the money part.

pdpi · 2026-03-22T13:02:33 1774184553

> (Win11 reverts).

I must've missed that one. What did they revert?

lpcvoid · 2026-03-22T13:12:02 1774185122

It doesn't matter - what Microslop says and what they do are traditionally very distinct things.

But in case you want to read yourself: https://blogs.windows.com/windows-insider/2026/03/20/our-com...

Traubenfuchs · 2026-03-22T13:19:01 1774185541

"File explorer launch experience" -hard to tell if this is satire…

Smalltalker-80 · 2026-03-22T16:18:54 1774196334

I did mean these, very recent promises (vaporware at this moment), without satire. https://blogs.windows.com/windows-insider/2026/03/20/our-com...

pdpi · 2026-03-18T14:37:48 1773844668

There's an important property that emerges from rules 3 and 4 — because the simple algorithm is easier to implement correctly, you can test the fancy algorithm for correctness by comparing its output to the simple one.

pdpi · 2026-03-16T21:17:56 1773695876

Features like hardware FP8 support definitely make it apples-to-oranges.

philjohn · 2026-03-16T23:32:33 1773703953

But doesn't the Apple M series NPU support FP8, and as it's a monolithic die (except for the GPU in the M5 Pro and Max) it could be argued it has hardware FP8 support, no?

pdpi · 2026-03-17T01:52:12 1773712332

By that logic, on the M4 (which still has the GPU on the same die as the CPU), CPU cores have hardware accelerated raytracing, which is obviously nonsense.

llm_nerd · 2026-03-17T11:48:41 1773748121

Apple's hardware does not support FP8 (neither the ANE NPU, or the new "neural accelerator" tensor cores), though the most recent variant supports INT8.

badc0ffee · 2026-03-16T23:47:42 1773704862

I thought the M5 had FP16 support, and not FP8.

pdpi · 2026-03-16T21:15:03 1773695703

> It is as related to "Agentic $whatever" as your toaster is related to it

These things have hardware FP8 support, and a 1.8TB/s full mesh interconnect between CPUs and GPUs. We can argue about the "agentic" bit, but those are features that don't really matter for any workload other than AI.

pezezin · 2026-03-16T22:38:48 1773700728

The huge interconnect would also useful be for HPC tasks. The FP8 not so much, HPC still loves FP64.

dmitrygr · 2026-03-16T21:23:05 1773696185

mem bw between cores matters for .... literally all workloads that are not single-core (read: all). And FP8 matters not at all cause inference on cpu is too slow to be of any use whatsoever in the days of proper accelerators

kibibu · 2026-03-16T21:19:57 1773695997

Would cloud gaming platforms benefit from the interconnect?

pdpi · 2026-03-16T21:24:23 1773696263

Don't think they would. Games aren't nearly as hungry for memory bandwidth as LLMs are. Also, I expect that the VRAM/GPU/CPU balance would be completely out of whack. Something would be twiddling its thumbs waiting for the rest of the hardware.

pdpi · 2026-03-16T15:01:13 1773673273

We all know that 1/3 + 1/3 + 1/3 = 1, but 0.33 + 0.33 + 0.33 = 0.99. We're sufficiently used to decimal to know that 1/3 doesn't have a finite decimal representation. Decimal 1/10 doesn't have a finite binary representation, for the exact same reason that 1/3 doesn't have one in decimal — 3 is co-prime with 10, and 5 is co-prime with 2.

The only leaky abstraction here is our bias towards decimal. (Fun fact: "base 10" is meaningless, because every base calls itself base 10)

1718627440 · 2026-03-16T15:55:07 1773676507

> Fun fact: "base 10" is meaningless, because every base calls itself base 10

Maybe we should name the bases by the largest digit they have, so that we are using base 9 most of the time.

pdpi · 2026-03-16T14:48:07 1773672487

> Big fat database schemas with big fat CRUD

It's not rocket science. Those particular database schemas, together with those particular CRUD layers, do something useful, and neither building nor maintaining those applications is part of the core business for most companies, so buying prebuilt from somebody else, and letting them maintain it for you, makes perfect business sense.

pdpi · 2026-03-14T21:39:27 1773524367

DST shenanigans aside (we're in the "US has changed but Europe hasn't" window), 10:00 in SF is 18:00 in London. Meaning their peak time window is 13:00–19:00 London time, or 14:00–20:00 Berlin time.

So us European folks get promotional rates during the morning and evening.

EDIT: Actually, because the promo ends at the end of March, it'll all be within DST shenanigans. So peak times are 12:00–18:00 London, 13:00–19:00 Berlin.