Hacker Newsnew | past | comments | ask | show | jobs | submit | great_psy's commentslogin

Thanks for working on this!

Is there any way to get those running on iPhone ? I would love to have the ability for it to read articles to me like a podcast.


yes, we're releasing an official mobile sdk and inference engine very soon. if you want to use something until then, some folks from the oss community have built ways to run kitten on ios. if you search kittentts ios on github you should find a few. if you cant find it, feel free to ping me and i can help you set it up. thanks a lot for your support and feedback!

Actually I think there’s another option.

There’s the scenario where LLMs get more efficient in size, and to get 2026 SOTA performance you will be able to get it from consumer grade laptop.

Sure with a 1000B parameter you will get better performance but the average person will have it write some python script, not derive new physics equations.

So in a sense the demand for LLM intelligence with reach a plateau (arguably we are there today for avg person) so there will not be any subsidy required, because the avg person will not need the latest and greatest.

There’s not the same demand pattern for something like uber.


> There’s the scenario where LLMs get more efficient in size, and to get 2026 SOTA performance you will be able to get it from consumer grade laptop.

But isn't that bad for the AI companies, too? Because then people just run an ~2026 SOTA performance open source model on their laptop for free and not pay any subscription.


Yes and no.

Regular folks will not pay Anthropic, but NSA, NASA or research labs might.

I’m not implying this will be a good time for AI companies. I am saying AI as a technology can provide value without it being controlled by only 3 companies.


In a hypothetical future with 2026 level LLMs on a (high end) consumer laptop, I still think that majority of buyers would prefer to pay 20 USD/month for a service. Just for the convenience and flexibility.


> In a hypothetical future with 2026 level LLMs on a (high end) consumer laptop, I still think that majority of buyers would prefer to pay 20 USD/month for a service. Just for the convenience and flexibility.

$20 a month is a lot of money, I don't think the "convenience and flexibility" you get would actually be worth it, unless you've 1) got money to burn, 2) lack the skills to install software, 3) the open source community totally fails to develop a reasonable installer. The LLM service would probably be akin to a scam preying on ignorance, like those companies that will rent you a water softener for like $100/month.


It is a lot compared to what? I believe that a LLM capable laptop will cost considerably more than something that is good-enough for non-LLM productivity tasks. At least within the next 5 years. Say that it would cost 600 USD more, that would buy 30 months of subscription. It is this kind of scenario I think many people will favor the subscription.


Maybe I’m not creative enough to see the potential, but what value does this bring ?

Given the example I saw about CRISPR, what does this model give over a different, non explaining model in the output ? Does it really make me more confident in the output if I know the data came from Arxiv or Wikipedia ?

I find the LLM outputs are subtlety wrong not obviously wrong


It makes the black box slightly more transparent. Knowing more in this regard allows us to be more precise—you go from prompt tweak witchcraft and divination to more of possible science and precise method.


Can this method be extended to go down to the sentence level ?

In the example it shows how much of the reason for an answer is due to data from Wikipedia. Can it drill down to show paragraph or sentence level that influences the answer ?


Your question should be "Can it drill down to show the paragraphs or sentences that influence the answer?"

I believe that the plagiarism complaint about llm models comes from the assumption that there is a one-to-one relationship between training and answers. I think the real and delightfully messier situation is that there is a many-to-one relationship.


The example on the website shows one to many as well: Wikipedia, axive article, etc along with a ratio how much it influences the chunk of the answer.


Exactly! We will have a future post that shows this more granularly over the coming weeks. Here is a post we wrote on how this works at smaller scale: https://www.guidelabs.ai/post/prism/


Oh, that looks like a wonderful article. I just skimmed it, and I hope to get back to it later today. One thing I would love to see is how much of the training set is substantially similar to each other, especially in the code training set.


Great questions. We have several posts in the works that will drill down more into these things. The model was actually designed to answer these questions for any sentence (or group of tokens it generates).

It can tell you which specific text (chunk) in the training data that led to the output the model generated. We plan to show more concrete demos of this capability over the coming weeks.

It can tell you where in the model's representation it learned about science, art, religion etc. And you can trace all of these to either to input context, training data, or model's representations.


Does it? If i make a system prompt for most models right now, tell them they were trained on {list} of datasets, and to attribute their answer to their training data, i get quite similar output. It even seems quite reasonable. The reason being each data corpus has a "vibe" to it and the predictions simply assign response vibe to dataset vibe.

That's still firmly in divination land.


Any place where we can test the llm output without loading it on n64?

Curious of what we can get out of those constraints.


Ok I promised videos here is two. LLM had serious head issues with C and python x86 versus mips c. now coherent english. Phase two is chat interface so we can prompt without seeded prompts, check the code its real inference though! The Emulator ---------------------------------------------- https://bottube.ai/watch/shFVLBT0kHY

The real iron! it runs faster on real iron! ---------------------------------------------- https://bottube.ai/watch/7GL90ftLqvh

The Rom image ---------------------------------------------- https://github.com/sophiaeagent-beep/n64llm-legend-of-Elya/b...

reply

acmiyaguchi 19 hours ago | prev | next [–]

This feels like an AI agent doing it's own thing. The screenshot of this working is garble text (https://github.com/sophiaeagent-beep/n64llm-legend-of-Elya/b...), and I'm skeptical of reasonable generation with a small hard-coded training corpus. And the linked devlog on youtube is quite bizzare too.


The introduction to the article is not paywalled. But the actual 2027 ai story is paywalled


Ah.


Can you provide some names of AI apps who’s revenue > cost ?


I mean, ChatGPT could easily be profitable today if they wanted to, but they're prioritizing growth


[citation needed]


Please stop the BS and take a basic corporate finance class.

FCFF = EBIT(1-t) - Reinvestment.

If OAI stops the Reinvestment, they lose to competition. Got it? Simple.


But leaving a light on 2x the time will equal very close to 2x the price.

Asking “what day is today” vs “create this api endpoint to adjust the inventory” will cost vastly different. And honestly I have no clue where to start to even estimate the cost unless I run the query.


A likely catalyst would be a DeepSeek like model where it has the capability of top model but in a much smaller footprint.

NVDA and cloud providers that are highly staked in AI would likely take a hit if a 70B model could do what Sonnet4 does.

But overall, AI is here to stay even if the market crashes, so it’s not really a AI bubble pop, more like a GPU pop.

And even then, GPU will still be in demand since the training will still need large clusters.


I have not delved to deep in the code, but is there any functional differences it has over Java other than the size ?

Presumably Java would also be pretty tiny if we wrote it in bytecode instead of higher lever Java.


The Java bytecode instruction set actually has a quite complicated specification: https://docs.oracle.com/javase/specs/jvms/se8/html/

Which means implementations also have to be correspondingly complicated. You have to handle quite a few different primitive data types each with their own opcodes, class hierarchies, method resolution (including overloading), a "constant pool" per class, garbage collection, exception handling, ...

I would expect a minimal JVM that can actually run real code generated by a Java compiler to require at least 10x as much code as a minimal Bedrock VM, and probably closer to 100x.


Is this an indication of the peak of the AI bubble ?

In a way this is saying that there are some GPUs just sitting around so they would rather get 50% than nothing for their use.


Seems more like electricity pricing, which has peak and offpeak pricing for most business customers.

To handle peak daily load you need capacity that goes unused in offpeak hours.


Why do you think that this means "idle GPU" rather than a company recognizing a growing need and allocating resources toward it?

It's cheaper because it's a different market with different needs which can be served by systems optimizing for throughput instead latency. Feels like you're looking for something that's not there.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: