FWIW Zero-3 refers to a common strategy for sharding model components across GPUs (commonly called FSDP-2, Full Sharded Data Parallel). The "3" is the level of sharding (how much stuff to distribute across GPUs, e.g. just weights, versus optimizer state as well, etc.)
It's important to note that pathological liars don't stop lying. In fact, when they're caught lying red-handed, they usually double down and lie even more.
Pretty disgusting behavior from the founders just posting as normal on linkedin/twitter as if this is run-of-the-mill. Fraudsters need to be nipped in the bud, lest we get trump-like scenarios.
reply