Does anyone know why the OS community was so quickly able to replicate (surpass?...

Vetch · on Nov 30, 2022

The reason is that Dall-E 2 type models are small and can run on a wide class of commodity hardware. This makes them very accessible which means a large number of people can contribute.

Large language models gain key capabilities as they increase in size: more reliable fact retrieval, multistep reasoning and synthesis, complex instruction following. The best publicly accessible is GPT-3 and at that scale you're looking at hundreds of gigabytes.

Models able to run on most people's machines fall flat when you try to do anything too complex with them. You can read any LLM paper and see how the models increase in performance with size.

The capabilities of available small models have increased by a lot recently as we've learned how to train LLMs but a larger model is always going to be a lot better, at least when it comes to transformers.

doctoboggan · on Nov 30, 2022

If you're opinion, what is the best model I can run on my M1 MBP with 64gb memory and 32 GPU cores?

Vetch · on Nov 30, 2022

For practical tasks, I would like to say FlanT5 11B which is 45GB but my experience is if you're using huggingface the usual way, it can initially take up to 2x the memory of the model to load.

GPT-JT was released recently and seems interesting but I haven't tried it. If you're focused on scientific domain and want to do Open book Q/A, summarization, keyword extraction etc. Galactica 6B parameter version might be worth checking out.

If our main language is not English one of the mt0 models might be worth a try https://huggingface.co/bigscience/mt0-xl

These models are distinguished by being able to follow relatively complex natural language instructions and examples without needing to be finetuned.

zaptrem · on Dec 1, 2022

I'm able to run a 22b parameter GPT-Neo model on my 24gb 3090 and can fit a 30b parameter OPT model when combining my 3090 and 12gb 3080

mdda · on Dec 1, 2022

Could you point to any resources online about how to do this? e.g. is this using 8-bit quantisation?

dash2 · on Nov 30, 2022

Is there no way to do kind of split-apply-combine with these models? So you could train GPT@home?

Vetch · on Nov 30, 2022

For inference, the best models are so large they won't fit in System RAM. GPT@home is not going to make a difference in that scenario.

For training such large models, data parallelism is no longer sufficient and tensor/pipeline parallelism is required. The problem is communication bottlenecks, differing device/network speeds and massive data transfer requirements become serious enough issues to kill any naive distributed training across the internet approach. Deep learning companies use fancy 100Gbps+ connections, do kernel hacking and use homogeneous hardware and it's still a serious challenge. There is no incentive for them to invest in something like GPT@home.

But it's not impossible and there's some research being done in the area. Although, it'll be a while until a GPT@home approach becomes a ready alternative. See https://arxiv.org/abs/2206.01288 and their recent GPT-JT test for more. Another development would be for networks to become more modular.

nullc · on Dec 1, 2022

Ram isn't terribly expensive, it's not unreasonable to have 1 or 2 TB of ram. 1TB costs about $3500 as 64GB dimms. (some of my 4u hosts have 96 ddr4 sockets too... though 6tb of ram is getting a little pricey. :))

> use fancy 100Gbps+ connections,

you can pick up 100gbps mellanox nics on ebay for $50 on a good day, $200 whenever. If you're only connecting up two or three hosts you can just use multiport cards and a couple dac cables, rather than a switch.

I suspect for inference though there is a substantial locality gain if you're able to batch a lot of users into a single operation, since you can stream the weights through while applying them to a bunch of queries at once. But that isn't necessarily lost on a single user, it would be nice to see a dozen distinct completions at once.

takantri · on Nov 30, 2022

I made an account to reply to this, since I tend to use KoboldAI[1][2] occasionally.

It's an open-source text generation frontend that you can run on your own hardware (or cloud computing like Google Colab). It can be used with any Transformers-compatible text generation model[3] (OpenAI's original GPT-2, EleutherAI's GPT-Neo, Facebook's OPT, etc).

It is debatable that OPT has hit that sweet spot in regards to "surpassing" GPT-3 in a smaller size. As far as I know, their biggest freely-downloadable model is 66B parameters (175B is available but requires request for access), but I had serviceable results in as little as 2.7B parameters, which can run on 16GB of RAM or 8GB of VRAM (via GPU).

There's a prominent member in the KAI community that even finetunes them on novels and erotic literature (the latter of which makes for a decent AI "chatting partner").

But you do bring up a great point: the field of OS text generation develops at a sluggish pace compared to Stable Diffusion. I assume people are more interested in generating their own images than they are text; that is just more impressive.

[1] - https://github.com/koboldai/koboldai-client

[2] - https://old.reddit.com/r/KoboldAI/

[3] - https://huggingface.co/models?pipeline_tag=text-generation

johnfn · on Nov 30, 2022

I’ve wondered the same thing. My working theory is that the ai art models are more interesting to a wider group of people than the language models, meaning they get better returns on the massive sums needed to invest to train such models. Ai art is really exciting for anyone who has ever dabbled in art before, because it can do things which I am utterly incapable of doing. For that reason I’m happy to pay for it. Ai language is not as exciting because it can basically perform the same tasks I can. So it’s interesting as a curiosity, but not as something I’d pay for.

domoritz · on Dec 1, 2022

I asked ChatGPT: "The OpenAI team released the DALL-E model architecture and training details, along with a large dataset of images and their corresponding captions, which allowed the open-source community to replicate and improve upon the model. In contrast, the GPT-3 model is much more complex and the training data is not publicly available, which makes it difficult for the open-source community to replicate or surpass the model. Additionally, the GPT-3 model is significantly larger than DALL-E, with 175 billion parameters, which makes it much more computationally expensive to train and fine-tune."

lelag · on Nov 30, 2022

I would think it is related to the fact that Stable Diffusion can run on consumer level hardware, whereas the largest language models don't, as they need hundreds of Gigs of GPU memory.

GaggiX · on Nov 30, 2022

You can run a text-to-image model on a consumer GPU, meanwhile you need a cluster of GPUs to run a model with GPT-3's capabilities. Also Dalle 2 is really inefficient so it was easily surpassed by latent diffusion models.