Quantifiable metrics are useful if they're credible, certainly. But does it seem...

refulgentis · on March 18, 2024

Yup, 100%. Grok isn't very good and it was rushed.

Rest re: pastiche model, etc. are proposing things I'm not claiming, or close to what I'm claiming.

n.b. you don't multiply the parameters by experts to get an effective parameter count. Why? Think of it this way: every expert needs to learn how to speak English, so there's a nontrivial amount of duplication among all experts

michaelt · on March 18, 2024

> n.b. you don't multiply the parameters by experts to get an effective parameter count.

I actually took the 314B from Grok's HF page [1] which describes the model as "314B parameters" when explaining why it needs a multi-GPU machine.

I certainly agree that parameter count isn't everything, though; clearly things like training data quality and fine tuning count for a lot.

[1] https://huggingface.co/xai-org/grok-1