Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I’m still trying to understand what is the biggest group of people that uses local AI (or will)? Students who don’t want to pay but somehow have the hardware? Devs who are price conscious and want free agentic coding?

Local, in my experience, can’t even pull data from an image without hallucinating (Qwen 2.5 VI in that example). Hopefully local/small models keep getting better and devices get better at running bigger ones

It feels like we do it because we can more than because it makes sense- which I am all for! I just wonder if i’m missing some kind of major use case all around me that justifies chaining together a bunch of mac studios or buying a really great graphics card. Tools like exo are cool and the idea of distributed compute is neat but what edge cases truly need it so badly that it’s worth all the effort?



Privacy, both personal and for corporate data protection is a major reason. Unlimited usage, allowing offline use, supporting open source, not worrying about a good model being taken down/discontinued or changed, and the freedom to use uncensored models or model fine tunes are other benefits (though this OpenAI model is super-censored - “safe”).

I don’t have much experience with local vision models, but for text questions the latest local models are quite good. I’ve been using Qwen 3 Coder 30B-A3B a lot to analyze code locally and it has been great. While not as good as the latest big cloud models, it’s roughly on par with SOTA cloud models from late last year in my usage. I also run Qwen 3 235B-A22B 2507 Instruct on my home server, and it’s great, roughly on par with Claude 4 Sonnet in my usage (but slow of course running on my DDR4-equipped server with no GPU).


+1 - I work in finance, and there's no way we're sending our data and code outside the organization. We have our own H100s.


Add big law to the list as well. There are at least a few firms here that I am just personally aware of running their models locally. In reality, I bet there are way more.


Add government here too (along with all the firms that service government customers)


Add healthcare. Cannot send our patients data to a cloud provider


A ton of EMR systems are cloud-hosted these days. There’s already patient data for probably a billion humans in the various hyperscalers.

Totally understand that approaches vary but beyond EMR there’s work to augment radiologists with computer vision to better diagnose, all sorts of cloudy things.

It’s here. It’s growing. Perhaps in your jurisdiction it’s prohibited? If so I wonder for how long.


In the US, HIPAA requires that health care providers complete a Business Associate Agreement with any other orgs that receive PHI in the course of doing business [1]. It basically says they understand HIPAA privacy protections and will work to fulfill the contracting provider's obligations regarding notification of breaches and deletion. Obviously any EMR service will include this by default.

Most orgs charge a huge premium for this. OpenAI offers it directly [2]. Some EMR providers are offering it as an add-on [3], but last I heard, it's wicked expensive.

1: https://www.hhs.gov/hipaa/for-professionals/covered-entities...

2: https://help.openai.com/en/articles/8660679-how-can-i-get-a-...

3: https://www.ntst.com/carefabric/careguidance-solutions/ai-do...


> Most LLM companies might not even offer it.

I'm pretty sure the LLM services of the big general-purpose cloud providers do (I know for sure that Amazon Bedrock is a HIPAA Eligible Service, meaning it is covered within their standard Business Associate Addendum [their name for the Business Associate Agreeement as part of an AWS contract].)

https://aws.amazon.com/compliance/hipaa-eligible-services-re...


Sorry to edit snipe you; I realized I hadn't checked in a while so I did a search and updated my comment. It appears OpenAI, Google, and Anthropic also offer BAAs for certain LLM services.


I worked a big health care company recently. We were using Azure's private instances of the GPT models. Fully industry compliant.


Even if it's possible, there is typically a lot of paperwork to get that stuff approved.

There might be a lot less paperwork to just buy 50 decent GPU's and have the IT guy self-host.


Europe? US? In Finland doctors can send live patient encounters to azure openai for transcription and summarization.


In the US, it would be unthinkable for a hospital to send patient data to something like ChatGPT or any other public services.

Might be possible with some certain specific regions/environments of Azure tho, because iirc they have a few that support government confidentiality type of stuff, and some that tout HIPAA compliance as well. Not sure about details of those though.


Possibly stupid question, but does this apply to things like M365 too? Because just like with Inference providers, the only thing keeping them from reading/abusing your data is a pinky promise contract.

Basically, isn't your data as safe/unsafe in a sharepoint folder as it is sending it to a paid inference provider?


Yap, companies are just paranoid, because it's new. Just like the cload back then. Sooner or later everyone will use an ai provider


A lot of people and companies use local storage and compute instead of the cloud. Cloud data is leaked all the time.


Look at (private) banks in Switzerland, there are enough press release, and I can confirm most of them.

Managing private clients direct data is still a concern if it can be directly linked to them.

Only JB I believe have on premise infrastructure for these use cases.


This is not a shared sentiment across the buy side. I’m guessing you work at a bank?


Does it mean that renting a Bare metal server with H100s is also out of question for your org?


Do you have your own platform to run inference?


I do think Devs are one of the genuine users of local into the future. No price hikes or random caps dropped in the middle of the night and in many instances I think local agentic coding is going to be faster than the cloud. It’s a great use case


I am extremely cynical about this entire development, but even I think that I will eventually have to run stuff locally; I've done some of the reading already (and I am quite interested in the text to speech models).

(Worth noting that "run it locally" is already Canva/Affinity's approach for Affinity Photo. Instead of a cloud-based model like Photoshop, their optional AI tools run using a local model you can download. Which I feel is the only responsible solution.)


I agree totally. My only problem is local models running on my old macMini run very much slower than that for example Gemini-2.5-flash. I have my Emacs setup so I can switch between a local model and one of the much faster commercial models.

Someone else responded to you about working for a financial organization and not using public APIs - another great use case.


These being mixture of expert (MOE) models should help. The 20b model only has 3.6b params active at any one time, so minus a bit of overhead the speed should be like running a 3.6b model (while still requiring the RAM of a 20b model).

Here's the ollama version (4.6bit quant, I think?) run with --verbose total duration: 21.193519667s load duration: 94.88375ms prompt eval count: 77 token(s) prompt eval duration: 1.482405875s prompt eval rate: 51.94 tokens/s eval count: 308 token(s) eval duration: 19.615023208s eval rate: 15.70 tokens/s

15 tokens/s is pretty decent for a low end MacBook Air (M2, 24gb of ram). Yes, it's not the ~250 tokens/s of 2.5-flash, but for my use case anything above 10 tokens/sec is good enough.


Yes, and help with grant reviews. Not permitted to use web AI.


It's striking how much of the AI conversation focuses on new use cases, while overlooking one of the most serious non-financial costs: privacy.

I try to be mindful of what I share with ChatGPT, but even then, asking it to describe my family produced a response that was unsettling in its accuracy and depth.

Worse, after attempting to delete all chats and disable memory, I noticed that some information still seemed to persist. That left me deeply concerned—not just about this moment, but about where things are headed.

The real question isn't just "what can AI do?"—it's "who is keeping the record of what it does?" And just as importantly: "who watches the watcher?" If the answer is "no one," then maybe we shouldn't have a watcher at all.


> Worse, after attempting to delete all chats and disable memory, I noticed that some information still seemed to persist.

I'm fairly sure "seemed" is the key word here. LLMs are excellent at making things up - they rarely say "I don't know" and instead generate the most probable guess. People also famously overestimate their own uniqueness. Most likely, you accidentally recreated a kind of Barnum effect for yourself.


  Worse, after attempting to delete all chats and disable memory, I noticed that some information still seemed to persist.
Chatgpt was court ordered to save history logs.

https://www.malwarebytes.com/blog/news/2025/06/openai-forced...


That only means that OpenAI have to keep logs of all conversations, not that ChatGPT will retain memories of all conversations.


you could explain that to ChatGPT and it would agree but then again, if you HAVE TO keep the logs ...


> I try to be mindful of what I share with ChatGPT, but even then, asking it to describe my family produced a response that was unsettling in its accuracy and depth.

> Worse, after attempting to delete all chats and disable memory, I noticed that some information still seemed to persist.

Maybe I'm missing something, but why wouldn't that be expected? The chat history isn't their only source of information - these models are trained on scraped public data. Unless there's zero information about you and your family on the public internet (in which case - bravo!), I would expect even a "fresh" LLM to have some information even without you giving it any.


I think you are underestimating how notable a person needs to be for their information to be baked into a model.


LLMs can learn from a single example.

https://www.fast.ai/posts/2023-09-04-learning-jumps/


That doesn’t mean they learn from every single example.


Healthcare organizations that can't (easily) send data over the wire while remaining in compliance

Organizations operating in high stakes environments

Organizations with restrictive IT policies

To name just a few -- well, the first two are special cases of the last one

RE your hallucination concerns: the issue is overly broad ambitions. Local LLMs are not general purpose -- if what you want is local ChatGPT, you will have a bad time. You should have a highly focused use case, like "classify this free text as A or B" or "clean this up to conform to this standard": this is the sweet spot for a local model


Pretty much all the large players in healthcare (provider and payer) have model access (OpenAI, Gemini, Anthropic)


This may be true for some large players in coastal states but definitely not true in general

Your typical non-coastal state run health system does not have model access outside of people using their own unsanctioned/personal ChatGPT/Claude accounts. In particular even if you have model access, you won't automatically have API access. Maybe you have a request for an API key in security review or in the queue of some committee that will get to it in 6 months. This is the reality for my local health system. Local models have been a massive boon in the way of enabling this kind of powerful automation at a fraction of the cost without having to endure the usual process needed to send data over the wire to a third party


That access is over a limited API and usually under heavy restrictions on the healthcare org side (e. g., only use a dedicated machine, locked up software, tracked responses and so on).

Running a local model is often much easier: if you already have data on a machine and can run a model without breaching any network one could run it without any new approvals.


What? It’s a straight connect to the models api from azure, aws, or gcp.

I am literally using Claude opus 4.1 right now.


> I am literally using Claude opus 4.1 right now

On HIPAA data?

HIPAA systems at any sane company will not have "a straight connect" to anything on Asure, AWS or GCP. They will likely have a special layer dedicated to record keeping and compliance.


Most healthcare systems are not using Azure, AWS, or GCP


Aren’t there HIPPA compliant clouds? I thought Azure had an offer to that effect and I imagine that’s the type of place they’re doing a lot of things now. I’ve landed roughly where you have though- text stuff is fine but don’t ask it to interact with files/data you can’t copy paste into the box. If a user doesn’t care to go through the trouble to preserve privacy, and I think it’s fair to say a lot of people claim to care but their behavior doesn’t change, then I just don’t see it being a thing people bother with. Maybe something to use offline while on a plane? but even then I guess United will have Starlink soon so plane connectivity is gonna get better


It's less that the clouds are compliant and more that risk management is paranoid. I used to do AWS consulting, and it wouldn't matter if you could show that some AWS service had attestations out the wazoo or that you could even use GovCloud -- some folks just wouldn't update priors.



If you're building any kind of product/service that uses AI/LLMs the answer is the same as why any company would want to run any other kind of OSS infra/service instead of relying on some closer proprietary vendor API.

  - Costs.
  - Rate limits.
  - Privacy.
  - Security.
  - Vendor lock-in.
  - Stability/backwards-compatibility.
  - Control.
  - Etc.


Except many OSS products have all of that and equal or better performance.


Why not turn the question around. All other things being equal, who would prefer to use a rate limited and/or for-pay service if you could obtain at least comparable quality locally for free with no limitations, no privacy concerns, no censorship (beyond that baked into the weights you choose to use), and no net access required?

It's a pretty bad deal. So it must be that all other things aren't equal, and I suppose the big one is hardware. But neural net based systems always have a point of sharply diminishing returns, which we seem to have unambiguously hit with LLMs already, while the price of hardware is constantly decreasing and its quality increasing. So as we go further into the future, the practicality of running locally will only increase.


> I’m still trying to understand what is the biggest group of people that uses local AI (or will)?

Well, the model makers and device manufacturers of course!

While your Apple, Samsung, and Googles of the world will be unlikely to use OSS models locally (maybe Samsung?), they all have really big incentives to run models locally for a variety of reasons.

Latency, privacy (Apple), cost to run these models on behalf of consumers, etc.

This is why Google started shipping 16GB as the _lowest_ amount of RAM you can get on your Pixel 9. That was a clear flag that they're going to be running more and more models locally on your device.

As mentioned, it seems unlikely that US-based model makers or device manufacturers will use OSS models, they'll certainly be targeting local models heavily on consumer devices in the near future.

Apple's framework of local first, then escalate to ChatGPT if the query is complex will be the dominant pattern imo.


>Google started shipping 16GB as the _lowest_ amount of RAM you can get on your Pixel 9.

The Pixel 9 has 12GB of RAM[0]. You probably meant the Pixel 9 Pro.

[0]: https://www.gsmarena.com/google_pixel_9-13219.php


Still an absurd amount of RAM for a phone, imo


Not absurd. The base S21 Ultra from 2021 already shipped with 12GB ram. 4 Years later and the amount of ram is still the same


Seems about right, my new laptop has 8x that which is a about the same ratio that my last new laptop had to my phone at the time.


Device makers also get to sell you a new device when you want a more powerful LLM.


Bingo!


I’m highly interested in local models for privacy reasons. In particular, I want to give an LLM access to my years of personal notes and emails, and answer questions with references to those. As a researcher, there’s lots of unpublished stuff in there that I sometimes either forget or struggle to find again due to searching for the wrong keywords, and a local LLM could help with that.

I pay for ChatGPT and use it frequently, but I wouldn’t trust uploading all that data to them even if they let me. I’ve so far been playing around with Ollama for local use.


~80% of the basic questions I ask of LLMs[0] work just fine locally, and I’m happy to ask twice for the other 20% of queries for the sake of keeping those queries completely private.

[0] Think queries I’d previously have had to put through a search engine and check multiple results for a one word/sentence answer.


"Because you can and its cool" would be reason enough: plenty of revolutions have their origin in "because you can" (Wozniak right off the top of my head, Gates and Altair, stuff like that).

But uncensored is a big deal too: censorship is capability reducing (check out Kilcher's GPT4Chan video and references, the Orca work and Dolphin de-tune lift on SWE-Bench style evals). We pay dearly in capability to get "non-operator-alignment", and you'll notice that competition is hot enough now that at the frontier (Opus, Qwen) the " alignment" away from operators aligned is getting very, very mild.

And then there's the compression. Phi-3 or something works on a beefy laptop and has a nontrivial approximation of "the internet" that works on an airplane or a beach with no network connectivity, talk about vibe coding? I like those look up all the docs via a thumbdrive in Phuket vibes.

And on diffusion stuff, SOTA fits on a laptop or close, you can crush OG mid journey or SD on a macbook, its an even smaller gap.

Early GPT-4 ish outcomes are possible on a Macbook Pro or Razer Blade, so either 12-18 month old LLMs are useless, or GGUF is useful.

The AI goalposts things cuts both ways. If AI is "whatever only Anthropic can do"? That's just as silly as "whatever a computer can't do" and a lot more cynical.


Why do any compute locally? Everything can just be cloud based right? Won't that work much better and scale easily?

We are not even at that extreme and you can already see the unequal reality that too much SaaS has engendered


> Won't that work much better and scale easily?

Doing computation that can happen at end points at the end points is massively more scaleable. Even better, its done by compute you usually aren't paying for if you're the company providing the service.

I saw an interview with the guy who made photopea where he talked about how tiny his costs were because all compute was done in the user's browser. Running a saas in a cloud is expensive.

It's an underrated aspect of what we used to call "software".

And that's leaving aside questions of latency and data privacy.


Comcast comes to mind ;-)


Real talk. I'm based in San Juan and while in general having an office job on a beautiful beach is about as good as this life has to offer, the local version of Comcast (Liberty) is juuusst unreliable enough that I'm buying real gear at both the office and home station after a decade of laptop and go because while it goes down roughly as often as Comcast, its even harder to get resolved. We had StarLink at the office for like 2 weeks, you need a few real computers lying around.


I'm excited to do just dumb and irresponsible things with a local model, like "iterate through every single email in my 20-year-old gmail account and apply label X if Y applies" and not have a surprise bill.

I think it can make LLMs fun.


I wrote a script to get my local Gemma3 insurance to tag and rename everything in my meme folder. :P


People like myself that firmly believe there will come a time, possibly very soon that all these companies (OpenAI, Anthropic etc) will raise their prices substantially. By then no one will be able to do their work to the standard expected of them without AI, and by then maybe they charge $1k per month, maybe they charge $10k. If there is no viable alternative the sky is the limit.

Why do you think they continue to run at a loss? From the goodness of their heart? Their biggest goal is to discourage anyobe from running local models. The hardware is expensive... The way to run models is very difficult (for example I have dual rtx 3090 for vram and running large heavily quantized models is a real pain in the arse, no high quantisation library supports two GPUs for example, and there seems to be no interest in implementating it by the guys behind the best inference tools).

So this is welcome, but let's not forget why it is being done.


> no high quantisation library supports two GPUs for example, and there seems to be no interest in implementating it by the guys behind the best inference tools

I'm curious to hear what you're trying to run, because I haven't used any software that is not compatible with multiple GPUs.


Pornography, or any other "restricted use". They either want privacy or don't want to deal with the filters on commercial products.

I'm sure there are other use cases, but much like "what is BitTorrent for?", the obvious use case is obvious.


A local laptop of the past few years without a discrete GPU can run, at practical speeds depending on task, a gemma/llama model if it's (ime) under 4GB.

For practical RAG processes of narrow scope and an even minimal amount of scaffolding a very usable speed for automating tasks, especially as the last-mile/edge device portion of a more complex process with better models in use upstream. Classification tasks, reasonay intelligent decisions between traditional workflow processes, other use cases-- a of them extremely valuable in enterprise, being built and deployed right now.


If you wanna compare on an h200 and play with trt-llm configs I setup this link here https://brev.nvidia.com/launchable/deploy?launchableID=env-3...


One of my favorite use cases includes simple tasks like generating effective mock/masked data from real data. Then passing the mock data worry-free to the big three (or wherever.)

There’s also a huge opportunity space for serving clients with very sensitive data. Health, legal, and government come to mind immediately. These local models are only going to get more capable of handling their use cases. They already are, really.


I'm guessing its largely enthusiasts for now, but as they continue getting better:

1. App makers can fine tune smaller models and include in their apps to avoid server costs

2. Privacy-sensitive content can be either filtered out or worked on... I'm using local LLMs to process my health history for example

3. Edge servers can be running these fine tuned for a given task. Flash/lite models by the big guys are effectively like these smaller models already.


Data that can't leave the premises because it is too sensitive. There is a lot of security theater around cloud pretending to be compliant but if you actually care about security a locked server room is the way to do it.


I can provide a real-world example: Low-latency code completion.

The JetBrains suite includes a few LLM models on the order of a hundred megabytes. These models are able to provide "obvious" line completion, like filling in variable names, as well as some basic predictions, like realising that the `if let` statement I'm typing out is going to look something like `if let Some(response) = client_i_just_created.foobar().await`.

If that was running in The Cloud, it would have latency issues, rate limits, and it wouldn't work offline. Sure, there's a pretty big gap between these local IDE LLMs and what OpenAI is offering here, but if my single-line autocomplete could be a little smarter, I sure wouldn't complain.


I don't have latency issue with github copilot. Maybe i'm less sensitive to it.


Just imagine the next PlayStation or XBox shipping with these models baked in for developer use. The kinds of things that could unlock.


Good point. Take the state of the world and craft npc dialogue for instance.


Yep that’s my biggest ask tbh. I just imagine the next Elder Scrolls taking advantage of that. Would change the gaming landscape overnight.


Games with LLM characters have been done and it turns out this is a shit idea.


There are a ton of ways to do this that haven't been tried yet.


I guarantee anything that’s already been put out is too early, and is very likely a rushed cash-grab. Which, of course that sucks.

And AI has been in games for a long time. Generated terrain and other sorts of automation have been used as techniques for a hot minute now.

All I’m suggesting is to keep on that same trajectory, now just using an on-device LLM to back intelligence features.


Sounds like a pre-Beatles "guitar groups are on their way out" kind of statement


> I’m still trying to understand what is the biggest group of people that uses local AI (or will)?

Creatives? I am surprised no one's mentioned this yet:

I tried to help a couple of friends with better copy for their websites, and quickly realized that they were using inventive phrases to explain their work, phrases that they would not want competitors to get wind of and benefit from; phrases that associate closely with their personal brand.

Ultimately, I felt uncomfortable presenting the cloud AIs with their text. Sometimes I feel this way even with my own Substack posts, where I occasionally coin a phrase I am proud of. But with local AI? Cool...


> I tried to help a couple of friends with better copy for their websites, and quickly realized that they were using inventive phrases to explain their work, phrases that they would not want competitors to get wind of and benefit from; phrases that associate closely with their personal brand.

But... they're publishing a website. Which competitors will read. Which chatbots will scrape. I genuinely don't get it.


there's a difference between an internal brief and a public copy.


I do it because 1) I am fascinated that I can and 2) at some point the online models will be enshitified — and I can then permanently fall back on my last good local version.


love the first and am sad you’re going to be right about the second


When it was floated about that the DeepSeek model was to be banned in the U.S., I grabbed it as fast as I could.

Funny how that works.


I mean, there's always torrents


I expect so. Still, it was easy to not have to even think about that.


In some large, lucrative industries like aerospace many of the hosted models are off the table due to regulations such as ITAR. There'a a market for models which are run on prem/in GovCloud with a professional support contract for installation and updates.


I'm in a corporate environment. There's a study group to see if maybe we can potentially get some value out of those AI tools. They've been "studying" the issue for over a year now. They expect to get some cloud service that we can safely use Real Soon Now.

So, it'll take at least two more quarters before I can actually use those non-local tools on company related data. Probably longer, because sense of urgency is not this company's strong suit.

Anyway, as a developer I can run a lot of things locally. Local AI doesn't leak data, so it's safe. It's not as good as the online tools, but for some things they're better than nothing.


If you have capable hardware and kids, a local LLM is great. A simple system prompt customisation (e.g. ‘all responses should be written as if talking to a 10 year old’) and knowing that everything is private goes a long way for me at least.


Local micro models are both fast and cheap. We tuned small models on our data set and if the small model thinks content is a certain way, we escalate to the LLM.

This gives us really good recall at really low cloud cost and latency.


I'd love to try this on my data set - what approach/tools/models did you use for fine-tuning?


Everything is built in-house unfortunately. Many of our small models are turned Qwen3. But we mostly chose the model on SOTA at the time we needed a model trained.


I would say, any company who doesn't have their own AI developed. You always hear companies "mandating" AI usage, but for the most part it's companies developing their own solutions/agents. No self-respecting company with a tight opsec would allow a random "always-online" LLM that could just rip your codebase either piece by piece or the whole thing at once if it's a IDE addon (or at least I hope that's the case). So yeah, I'd say locally deployed LLM's/Agents are a gamechanger.


Jail breaking then running censored questions. Like diy fireworks, or analysis of papers that touch "sensitive topics", nsfw image generation the list is basically endless.


At the company where I currently work, for IP reasons (and with the advice of a patent lawyer), nobody is allowed to use any online AIs to talk about or help with work, unless it's very generic research that doesn't give away what we're working on.

That rules out coding assistants like Claude, chat, tools to generate presentations and copy-edit documents, and so forth.

But local AI are fine, as long as we're sure nothing is uploaded.


The use case is building apps.

A small LLM can do RAG, call functions, summarize, create structured data from messy text, etc... You know, all the things you'd do if you were making an actual app with an LLM.

Yeah, chat apps are pretty cheap and convenient for users who want to search the internet and write text or code. But APIs quickly get expensive when inputting a significant amount of tokens.


Don't know about the biggest, but IMO the exciting things about open models is the possibility of creating whole new things.

For example, "generate a heatmap of each token/word and how 'unexpected' they are" or "find me a prompt that creates the closest match to this text"

To be efficient both require access that is not exposed over API.


Use Case?

How about running one on this site but making it publically available? A sort of outranet and calling it HackerBrain?


There's a bunch of great reasons in this thread, but how about the chip manufacturers that are going to need you to need a more powerful set of processors in your phone, headset, computer. You can count on those companies to subsidize some R&D and software development.


The cloud AI providers have unacceptable variation in response time for things that need a predictable runtime.

Even if they did offer a defined latency product, you’re relying on a lot of infrastructure between your application and their GPU.

That’s not always tolerable.


>Students who don’t want to pay but somehow have the hardware?

that's me - well not a student anymore. when toying with something, i much prefer not paying for each shot. my 12GB Radeon card can either run a decent extremely slow, or a idiotic but fast model. it's nice not dealing with rate limits.

once you write a prompt that mangles an idiotic model into still doing the work, it's really satisfying. the same principle as working to extract the most from limited embedded hardware. masochism, possibly


> I’m still trying to understand what is the biggest group of people that will use local AI?

iPhone users in a few months – because I predict app developers will love cramming calls to the foundation models into everything.

Android will follow.


Some app devs use local models on local environments with LLM APIs to get up and running fast, then when the app deploys it switches to the big online models via environment vars.

In large companies this can save quite a bit of money.


One use nobody mentions is hybrid use.

Why not run all the models at home, maybe collaboratively or at least in parallel?

I'm sure there are use cases where the paid models are not allowed to collaborate or ask each other.

also, other open models are gaining mindshare.


Privacy laws. Processing government paperwork with LLMs for example. There's a lot of OCR tools that can't be used, and the ones that comply are more expensive than say, GPT-4.1 and lower quality.


anything involving the medical industry (HIPAA laws), national security (FedRAMP is such a pita to get that some military contractors are bypassing it to get quicker access to cloud tools) etc.

Besides that, we are moving towards an era where we won't need to pay providers a subscription every month to use these models. I can't say for certain whether or not the GPUs that run them will get cheaper, but the option to run your own model is game changing for more than you can possibly imagine.


Agencies / firms that work with classified data. Some places have very strict policies on data, which makes it impossible to use any service that isn't local and air-gapped.

example: military intel


People who want programmatic solutions that wont be rug pulled


I’d use it on a plane if there was no network for coding, but otherwise it’s just an emergency model if the internet goes out, basically end of the world scenarios


worth mentioning that todays expensive hardware will be built into the cheapest iPhone in less than 10 years.

That means running instantly offline and every token is free


You’re asking the biggest group of people who would want to do this


Privacy and equity.

Privacy is obvious.

AI is going to to be equivalent to all computing in the future. Imagine if only IBM, Apple and Microsoft ever built computers, and all anyone else ever had in the 1990s were terminals to the mainframe, forever.


I am all for the privacy angle and while I think there’s certainly a group of us, myself included, who care deeply about it I don’t think most people or enterprises will. I think most of those will go for the easy button and then wring their hands about privacy and security as they have always done while continuing to let the big companies do pretty much whatever they want. I would be so happy to be wrong but aren’t we already seeing it? Middle of the night price changes, leaks of data, private things that turned out to not be…and yet!


I wring my hands twice a week about internet service providers; Comcast and Starlink. And I live in a nominally well serviced metropolitan area.


> AI is going to to be equivalent to all computing in the future.

Thanks, but I prefer my computing to be deterministic if at all possible.


Did you mean to type equality? As in, "everyone on equal footing"? Otherwise, I'm not sure how to parse your statement.


We use it locally for deep packet inspection.


Same as the internet: porn.


Psychs who dont trust ai companies


Maybe I am too pessimistic, but as an EU citizen I expect politics (or should I say Trump?) to prevent access to US-based frontier models at some point.


I am just a cheapskate that wants to scale back on all subscription costs. I fucking hate subscriptions.


air gaps, my man.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: