yes, we're releasing an official mobile sdk and inference engine very soon. if you want to use something until then, some folks from the oss community have built ways to run kitten on ios. if you search kittentts ios on github you should find a few.
if you cant find it, feel free to ping me and i can help you set it up. thanks a lot for your support and feedback!
There’s the scenario where LLMs get more efficient in size, and to get 2026 SOTA performance you will be able to get it from consumer grade laptop.
Sure with a 1000B parameter you will get better performance but the average person will have it write some python script, not derive new physics equations.
So in a sense the demand for LLM intelligence with reach a plateau (arguably we are there today for avg person) so there will not be any subsidy required, because the avg person will not need the latest and greatest.
There’s not the same demand pattern for something like uber.
> There’s the scenario where LLMs get more efficient in size, and to get 2026 SOTA performance you will be able to get it from consumer grade laptop.
But isn't that bad for the AI companies, too? Because then people just run an ~2026 SOTA performance open source model on their laptop for free and not pay any subscription.
Regular folks will not pay Anthropic, but NSA, NASA or research labs might.
I’m not implying this will be a good time for AI companies. I am saying AI as a technology can provide value without it being controlled by only 3 companies.
In a hypothetical future with 2026 level LLMs on a (high end) consumer laptop, I still think that majority of buyers would prefer to pay 20 USD/month for a service. Just for the convenience and flexibility.
> In a hypothetical future with 2026 level LLMs on a (high end) consumer laptop, I still think that majority of buyers would prefer to pay 20 USD/month for a service. Just for the convenience and flexibility.
$20 a month is a lot of money, I don't think the "convenience and flexibility" you get would actually be worth it, unless you've 1) got money to burn, 2) lack the skills to install software, 3) the open source community totally fails to develop a reasonable installer. The LLM service would probably be akin to a scam preying on ignorance, like those companies that will rent you a water softener for like $100/month.
It is a lot compared to what? I believe that a LLM capable laptop will cost considerably more than something that is good-enough for non-LLM productivity tasks. At least within the next 5 years. Say that it would cost 600 USD more, that would buy 30 months of subscription. It is this kind of scenario I think many people will favor the subscription.
Maybe I’m not creative enough to see the potential, but what value does this bring ?
Given the example I saw about CRISPR, what does this model give over a different, non explaining model in the output ?
Does it really make me more confident in the output if I know the data came from Arxiv or Wikipedia ?
I find the LLM outputs are subtlety wrong not obviously wrong
It makes the black box slightly more transparent. Knowing more in this regard allows us to be more precise—you go from prompt tweak witchcraft and divination to more of possible science and precise method.
Can this method be extended to go down to the sentence level ?
In the example it shows how much of the reason for an answer is due to data from Wikipedia. Can it drill down to show paragraph or sentence level that influences the answer ?
Your question should be "Can it drill down to show the paragraphs or sentences that influence the answer?"
I believe that the plagiarism complaint about llm models comes from the assumption that there is a one-to-one relationship between training and answers. I think the real and delightfully messier situation is that there is a many-to-one relationship.
Exactly! We will have a future post that shows this more granularly over the coming weeks. Here is a post we wrote on how this works at smaller scale: https://www.guidelabs.ai/post/prism/
Oh, that looks like a wonderful article. I just skimmed it, and I hope to get back to it later today. One thing I would love to see is how much of the training set is substantially similar to each other, especially in the code training set.
Great questions. We have several posts in the works that will drill down more into these things. The model was actually designed to answer these questions for any sentence (or group of tokens it generates).
It can tell you which specific text (chunk) in the training data that led to the output the model generated. We plan to show more concrete demos of this capability over the coming weeks.
It can tell you where in the model's representation it learned about science, art, religion etc. And you can trace all of these to either to input context, training data, or model's representations.
Does it? If i make a system prompt for most models right now, tell them they were trained on {list} of datasets, and to attribute their answer to their training data, i get quite similar output. It even seems quite reasonable. The reason being each data corpus has a "vibe" to it and the predictions simply assign response vibe to dataset vibe.
Ok I promised videos here is two. LLM had serious head issues with C and python x86 versus mips c. now coherent english. Phase two is chat interface so we can prompt without seeded prompts, check the code its real inference though!
The Emulator ---------------------------------------------- https://bottube.ai/watch/shFVLBT0kHY
This feels like an AI agent doing it's own thing. The screenshot of this working is garble text (https://github.com/sophiaeagent-beep/n64llm-legend-of-Elya/b...), and I'm skeptical of reasonable generation with a small hard-coded training corpus. And the linked devlog on youtube is quite bizzare too.
But leaving a light on 2x the time will equal very close to 2x the price.
Asking “what day is today” vs “create this api endpoint to adjust the inventory” will cost vastly different. And honestly I have no clue where to start to even estimate the cost unless I run the query.
Which means implementations also have to be correspondingly complicated. You have to handle quite a few different primitive data types each with their own opcodes, class hierarchies, method resolution (including overloading), a "constant pool" per class, garbage collection, exception handling, ...
I would expect a minimal JVM that can actually run real code generated by a Java compiler to require at least 10x as much code as a minimal Bedrock VM, and probably closer to 100x.
Why do you think that this means "idle GPU" rather than a company recognizing a growing need and allocating resources toward it?
It's cheaper because it's a different market with different needs which can be served by systems optimizing for throughput instead latency. Feels like you're looking for something that's not there.
Is there any way to get those running on iPhone ? I would love to have the ability for it to read articles to me like a podcast.
reply