DEEPSEEK IS NOT OPEN SOURCE, THEY JUST PUBLISHED THE WEIGHTS

dang · on Jan 29, 2025

"Please don't use uppercase for emphasis. If you want to emphasize a word or phrase, put asterisks around it and it will get italicized."

https://news.ycombinator.com/newsguidelines.html.

otterley · on Jan 29, 2025

In this case, I think yelling louder is useful. We need to band together to eliminate this false and misleading appellation.

We have had a term to describe this kind of software for decades: "freeware." That's what this and all other "free to download and use" offerings are; they are not open source under any commonly-understood meaning prior to last year.

fuddle · on Jan 29, 2025

To be called open source under the new Open Source AI Definition. They'd need to release the: Data Information, Code and Parameters. https://opensource.org/ai/open-source-ai-definition

maxloh · on Jan 29, 2025

I like looneysquash's viewpoint about the definition of open source AI. You will need to have all parts involved open-sourced to make a model "open", not just the weights:

> The trained model is object code. Think of it as Java byte code. You have some sort of engine that runs the model. That's like the JVM, and the JIT. And you have the program that takes the training data and trains the model. That's your compiler, your javac, your Makefile and your make. And you have the training data itself, that's your source code.

> Each of the above pieces has its own source code. And the training set is also source code. All those pieces have to be open to have a fully open system. If only the training data is open, that's like having the source, but the compiler is proprietary. If everything but the training set is open, well, that's like giving me gcc and calling it Microsoft Word.

https://news.ycombinator.com/item?id=41952722

visarga · on Jan 29, 2025

> You will need to have all parts involved open-sourced to make a model "open", not just the weights

How do you propose to opensource terabytes of web scrape text? They give you what they can give you - paper, code, model weights. You can reimplement the code, while the weights are open to do what you like with them.

iab · on Jan 29, 2025

Oh my gosh THANK YOU - a repository of paper images and weights is not open source

tgtweak · on Jan 29, 2025

I think they also published the training methodology as well - that others have reproduced, no? The only thing that I'm not sure is their low level nvidia CTX training code was released under the license - but in order for a third party to corroborate the training and testing they would need to have that code (and likely the training data as well) would they not?

jfarina · on Jan 29, 2025

They outlined the methodology. They didn't publish their code or the training set.

cruffle_duffle · on Jan 29, 2025

How could they publish the terabytes of training data? A million RAR files?

Honestly would that part even be useful? Like I want to know how they did the training so I can repro it with my own set of training data, right?

I mean, isn't that the future? Somebody figures out how to do P2P distributed training and groups can crawl the web training their own open source models?

tgtweak · on Jan 29, 2025

I'd torrent it :D

thayne · on Jan 29, 2025

True. But at the same time, it is more open than "Open" AI. Or even LLAMA.