Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think it's mostly the memory bandwidth though that makes the GPUs so fast with LLMs. My card does about 1TB/s. CPU RAM won't come near that. I'm sure a lot of optimisations can be had but I think GPUs will still be significantly ahead.

Macs are so good at it because Apple solder the memory on top of the SoC for a really wide and low latency connection.



This is a good and valid comment. It is difficult to predict the future, but I would be curious what the best case theoretical performance of an LLM on a typical x86 or ARM system with DDR4 or DDR5 RAM. My uneducated guess is that it can be very good, perhaps 50% the speed of a specialized GPU/RAM device. In practical terms, the CPU approach is required for very large contexts, up to as large as the lifetime of all interactions you have with your LLM.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: