telenardo's comments

telenardo · 2026-01-30T14:54:23 1769784863

It's not set up for that, no, though it's theoretically possible!

The issues I see are: - Transcription models use beam search to choose the most likely words at each step, taking into account the surrounding words. The accuracy will drop a lot if you pick each top word individually as it’s spoken. The surrounding context matters a lot. - To that point, transcription models do get things wrong (i.e. "best" instead of "test"). The LLM post-processing can help here, by taking in the top-N hypotheses from the transcription mode and determining which makes the most sense (i.e. "run the tests", not "run the bests"), adding another layer of semantic understanding. Again, the surrounding context really matters here.

Do you need each word to stream individually? Or would it be sufficient for short phrases to stream?

The MLX inference is so fast that you could accomplish something like the latter by releasing and re-pressing the shortcut every 5-10 words. It so fast it honestly feels like streaming. In practice, I tend to do something like this anyway, because I find it easier to review shorter transcripts!

mkw5053 · 2026-02-02T15:48:30 1770047310

I'd be happy with short phrases and for the text to change as I continue to speak.

telenardo · on Jan 26, 2025

Not yet, but if all goes according to plan, you should be able to soon!

telenardo · on Jan 26, 2025

I would agree!

telenardo · on Jan 26, 2025

(that this is all a bit tangential)

telenardo · on Jan 26, 2025

It's hard to find unoccupied shortcuts these days! I don't use shortcuts on the numbers often, so I set that as a default. But yes, it's easily configurable in settings so you can choose something that works for your workflows.

telenardo · on Jan 26, 2025

Sarp! Good to hear from you! I hope life has been good since the Instagram days. Yes, I've noticed the multi-resizing issue with cmd + 8 - I'll look into it this week. Regarding the cmd + 0 toggle, I think I can probably make that work too. We can set it up so you can set your dismiss shortcut. Then, you can choose the same as the launch shortcut, making it a toggle. I'll also take a look at that this week.

telenardo · on Jan 25, 2025

Thanks for the feedback! Fixed these - though the site title seems to be cached...

telenardo · on Jan 25, 2025

Addressed towards the end of the post:

"Why not Linux or Windows? Gotta start somewhere! If the reception is positive, we’ll work hard to add further support."

As you can see from the commit log, we have 3 people working on this. So we're quite limited in what we can take on. That said, our belief holds and we'd love to support Linux and Windows.

I had "MacOS" in my original title, but HN limits titles to 80 characters!

scottyeager · on Jan 25, 2025

It would be nice if the README made it clear toward the top that this is Mac software. The screenshot and mention of Xcode give that vibe of course, but I kept reading anyway and felt a bit bummed to only confirm at the end.

Looks like a cook project and wishing y'all the best. Let us know if and when the Linux support drops :)

telenardo · on Jan 25, 2025

We pull the list of available models from that URL! Not sending anything, just fetching so we can display an up-to-date list of models.

This the URL that we fetch: https://syntheticco.blob.core.windows.net/onit/models.json

evilduck · on Jan 25, 2025

Fair enough, thanks for the fast reply!

telenardo · on Jan 24, 2025

Yes, that's right. This was definitely not intentional and we are very open to changing it to something more appropriate!

freeone3000 · on Jan 24, 2025

I think the license choice is great. It allows noncommercial use, modification, and redistribution. It’s not “open source” according to the champions of the term (since it violates the use-for-any-purpose requirement) but I’m a huge fan of this license and license several of my projects CC-NC-BY where AGPL would be too heavy-handed.

MacsHeadroom · on Jan 25, 2025

BSD or MIT license would be nice.

HeatrayEnjoyer · on Jan 25, 2025

AGPL would be better

gus_massa · on Jan 25, 2025

Amazon and other cloud providers avoid AGPL, so I think it's closer to the intentions of the OP.

josephernest · on Jan 24, 2025

I think your choice is very appropriate.

And it is open source.

Probably not OSI-open source or FSF-open source but it is open source, period.

emacsen · on Jan 24, 2025

"It's not recognized as Open Source by the Open Source body, and doesn't meet the criteria of Free/Open Source Software, but is Open Source" is a bit like saying "I used GMO and petroleum based pesticides, but my produce is all organic."

josephernest · on Jan 26, 2025

But here the source is open!

Why should we restrict the meaning of Opel Source, a societal mouvement since decades, to a list of criteria that FSF or OSI decided?

Open source is not a trade mark by FSF or OSI.

OP did not say it is free/libre software, but just open source, which it is.

We don't need "source available", just open source is correct.

PS: can you define the open source body in your previous comment?

emacsen · on Jan 26, 2025

Why should words like "organic" in relation to food mean without pesticides? I mean all carbon and water based life forms are organic, right?

I can define Open Source easily, using the OSI definition.

There is not a trademark for Open Source because they failed to secure the trademark, but we have decades of use for the term meaning something specific.

telenardo · on Dec 10, 2024

For those curious (and still locked out) here’s direct a comparison of Sora vs. the open-source leaders (HunyuanVideo, Mochi and LTX):

https://app.checkbin.dev/snapshots/1f0f3ce3-6a30-4c1a-870e-2...

Pros:

- Some of the Sora results are absolutely stunning. Check out the detail on the lion, for example! - The landscapes and aerial shots are absolutely incredible. - Quality is much better than Mochi & LTX out of the box. Mochi/LTX seem to require specifically optimized workflows (I've seen great img2vid LTX results on Reddit that start with Flux image generations, for example). Hunyuan seems comparable to Sora!

Cons:

- Still nearly impossible to access Sora despite the “launch”. My generations today were in the 2000s, implying that it’s only open to a very small number of people. There’s no api yet, so it’s not an option for developers. - Sora struggles with physical interactions. Watch the dancers moonwalk, or the ball goes through the dog. HunyuanVideo seems to be a bit better in this regard. - Can't run it locally mode (obviously) - I haven't tested this, but I think it's safe to assume Sora will be censored extensively. HunyuanVideo is surprisingly open (I've seen NSFW generations!) - I’m getting weird camera angles from Sora, but that could likely be solved with better prompting.

Overall, I’d say it’s the best model I've played with, though I haven’t spent much time on other non-open-source ones. Hunyuan gives it a run for its money, though!

spondyl · on Dec 10, 2024

I can't speak to any of those videos in a technical sense but personally, I don't feel like any of them are good?

The vibe they give me is similar to the iPhone photography commercials where yes, in theory, a picnic in the park could look exactly like this except for all the parts that seem movie perfect.

I guess it's really more of a colour grading question where most of the Sora colour grading triggers that part of my brain that says "I'm watching a movie and this isn't real" without quite realising why.

A few of the Hunyuan videos in contrast seem a bit more believable even though they have some obvious glitches at times.

The other thing I think Sora has is that thing in commercials where no one else except the protagonist exists and nothing is ever inconvenient. The video of the teacher in a classroom with no students reminds me of that as well as the picnic in the park where there's wide open space with no one around.

I suppose it depends if the goal is to generate believable video and how you define believable.

zuminator · on Dec 10, 2024

Hunyuan was more realistic but lower quality than Sora, shorter videos with lower resolution or bitrate. The downside to Sora's sharpness is that it makes mistakes more apparent. Also funny that Sora didn't understand the rolling dunes metaphor.

CSMastermind · on Dec 10, 2024

Based on this it really seems like Hunyuan is a significantly better model. In nearly every example I preferred its output.