Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It seems like OpenAI are finally living up to their name for once with this release? Anything I'm missing?

From what I can gather:

1. Includes model weights. I can't find the URL, but they reference them enough and have a CLI tool, so I presume I just haven't found them yet.

2. Includes code: https://github.com/openai/whisper

3. Released under MIT License: https://github.com/openai/whisper/blob/main/LICENSE



It's one model and in a non-strategic area where there are existing open source projects (Kaldi, DeepSpeech, ...).

For a company that raised $1B, that's not exactly living up to their name and original mission.


Yes. The same is true of many products from many companies.

I feel bad about GPT-3 and DALL-E being released under the terms they were, but I don't feel bad about this. I'm not going to condemn OpenAI for the good things they did, but I will hold them accountable for bad things or good ones they didn't do.

I'd given up on OpenAI being open or ethical, but this is a start. It took them down from "evil super-villain" status to mere villain.


> It's one model and in a non-strategic area where there are existing open source projects (Kaldi, DeepSpeech, ...).

I can already tell this is much better than any of the existing open source projects with the exception of the wav2* sequence of projects and potentially nvidia's nemo.


Kaldi is an open, pluggable framework and is a ton more flexible and powerful than this. It's used by hundreds of teams, including a number of consumer tech companies you've heard of. They're not going to move to this over it.

Especially because ASR is a living organism. You have to constantly update your language model as new people, ideas, and words move into the normal lexicon. As people start talking about "COVID", "metaverse", "king charles", or whatever new things that happen, these need to be added to your language model. You need these updates monthly at a minimum and OpenAI didn't release the raw data which means you can't retrain it even if you wanted to spend the time/resources to.

So, this is an interesting research project and helpful for small teams and side projects, but it's unlikely it makes any real impact on the industry.


Kaldi just is not fast or high quality enough compared to other modern alternatives like wav2letter. I appreciate that it is more flexible than this, it certainly is - but I am not so sure about "powerful."


Have you actually tried to use Kaldi though? I have. It's basically impenetrable unless your full time job is working with Kaldi.


This kind of model is harder to abuse, so I guess it passed their internal checks much more easily.

I can understand not releasing GPT-3, even if I disagree with the decision.


> This kind of model is harder to abuse, so I guess it passed their internal checks much more easily.

The version I choose to believe: stability.ai ate DALL-E for lunch, and that woke them up.


This is probably also true.


True. The potential of GPT-3 to cause internet mayhem was/is significant. I would argue that the mere act of announcing it was still a catalyst for an eventual GPT-3-like model being released. In revealing it, they established a target for what open source models could aim to achieve, and simultaneously got bad actors thinking about ways to abuse it.


It was a credible argument when GPT-3 was released. But now there are open models that are as capable as GPT-3 and that mayhem has not materialized, with the possible exception of GPT-4chan. They could release it now under a non-commercial license, if they cared to.


Can you provide an example of an open model as capable as GPT-3?

I know there's some "mini-GPT" type models around, but they don't seem nearly as capable.


My experience with GPT-3 is that while it does perform better than those mini-GPT small models, the gap does not compensate for the fact that the small models are free/unrestricted and you can use them as much as you like.

As mentioned elsewhere in the thread there are some large models around the 50-200B band that compete directly with GPT-3, but I haven’t used these.


> I can understand not releasing GPT-3, even if I disagree with the decision.

Why do you disagree?


Two reasons. First, someone else will release something similar. Second, I didn’t see a related push from them to work with other in the industry to do something productive towards safety with the time they got by delaying availability of these kinds of models. So it felt disingenuous.


Several groups already have. Facebook's OPT-175B is available to basically anyone with a .edu address (models up to 66B are freely available) and Bloom-176B is 100% open:

https://github.com/facebookresearch/metaseq

https://huggingface.co/bigscience/bloom


Yup. I meant when it had just come out.


I don’t see how GPT-3 is any more dangerous than Stable Diffusion, Photoshop, that fake news website the crazy person you’re friends with on Facebook really likes, or any of the number of other tools and services that can be used to generate or spread fake information.


All of your examples are limited in some way, but GPT-3 wouldn't have any meaningful limits.

Stable Diffusion: Marks images as AI-generated. (invisible watermark, but still, it's there)

Photoshop: Requires time & effort from a human.

Fake news website: Requires time & effort from a human.


I wouldn't really say Stable Diffusion marks images as AI-generated. There's a script in the Stable Diffusion repository that will do that, but it's not connected to the model itself in a meaningful way. I use Stable Diffusion a lot and I've never touched this script.

https://github.com/CompVis/stable-diffusion/blob/69ae4b35e0a...


What "script" are you using for doing txt2img? The watermark function is automatically called when you use the CLI in two places, https://github.com/CompVis/stable-diffusion/blob/69ae4b35e0a... and https://github.com/CompVis/stable-diffusion/blob/69ae4b35e0a...

Trivial to remove, I give you that. But AFAIK, the original repository + most forks put the watermark automatically unless you've removed it on your own.


>Trivial to remove, I give you that. But AFAIK, the original repository + most forks put the watermark automatically unless you've removed it on your own.

almost all of the 'low-vram' variant forks either have an argument to turn off the watermark (it saves a bit of memory) or come with it disabled all together.


I linked to the same file you did, that is the "script" I was referring to. And I said that I didn't use it.

My point is that the Python API is more interesting than the txt2img script, and it doesn't add any watermarks.


SD only does that if you don't delete the line of code that does it...


It would be pretty trivial to have an invisible watermark in GPT3 output-- though you don't really need one: just score text with gpt3 to find out if it was likely gpt3 generated or not.


Because why should the wealthy and connected be the only ones -allowed- have access to such life improving technology?



Large is 3GB to save everyone a click. Tiny is 72MB.


That's unexpectedly lightweight - enough to run in some phones.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: