Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

can you please give an estimate how much slower/faster is it on your macbook compared to comparable models running in the cloud?


Sure.

This is a thinking model, so I ran it against o4-mini, here are the results:

* gpt-oss:20b

* Time-to-first-token: 2.49 seconds

* Time-to-completion: 51.47 seconds

* Tokens-per-second: 2.19

* o4-mini on ChatGPT

* Time-to-first-token: 2.50 seconds

* Time-to-completion: 5.84 seconds

* Tokens-per-second: 19.34

Time to first token was similar, but the thinking piece was _much_ faster on o4-mini. Thinking took the majority of the 51 seconds for gpt-oss:20b.


You can get a pretty good estimate depending on your memory bandwidth. Too many parameters can change with local models (quantization, fast attention, etc). But the new models are MoE so they’re gonna be pretty fast.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: