can you please give an estimate how much slower/faster is it on your macbook com...

cco · 2025-08-05T22:18:48 1754432328

Sure.

This is a thinking model, so I ran it against o4-mini, here are the results:

* gpt-oss:20b

* Time-to-first-token: 2.49 seconds

* Time-to-completion: 51.47 seconds

* Tokens-per-second: 2.19

* o4-mini on ChatGPT

* Time-to-first-token: 2.50 seconds

* Time-to-completion: 5.84 seconds

* Tokens-per-second: 19.34

Time to first token was similar, but the thinking piece was _much_ faster on o4-mini. Thinking took the majority of the 51 seconds for gpt-oss:20b.

syntaxing · 2025-08-05T22:01:52 1754431312

You can get a pretty good estimate depending on your memory bandwidth. Too many parameters can change with local models (quantization, fast attention, etc). But the new models are MoE so they’re gonna be pretty fast.