Hacker Newsnew | past | comments | ask | show | jobs | submit | BloondAndDoom's commentslogin

Isn’t hugging face only shows it for the model you are looking for? Is there a page that actually HF suggests a model based on your HW?

This pretty cool, and useful but I only wish this was a website. I don’t like the idea of running an executable for something that can perfectly be done as a website. (Other than some minor features, tbh even you can enable Corsair and still check the installed models from a web browser).

Sounds like a fun personal project though.


>I only wish this was a website. I don’t like the idea of running an executable for something that can perfectly be done as a website.

The tool depends on hardware detection. From https://github.com/AlexsJones/llmfit?tab=readme-ov-file#how-... :

  How it works
  Hardware detection -- Reads total/available RAM via sysinfo, counts CPU cores, and probes for GPUs:

  NVIDIA -- Multi-GPU support via nvidia-smi. Aggregates VRAM across all detected GPUs. Falls   back to VRAM estimation from GPU model name if reporting fails.
  AMD -- Detected via rocm-smi.
  Intel Arc -- Discrete VRAM via sysfs, integrated via lspci.
  Apple Silicon -- Unified memory via system_profiler. VRAM = system RAM.
  Ascend -- Detected via npu-smi.
  Backend detection -- Automatically identifies the acceleration backend (CUDA, Metal, ROCm, SYCL, CPU ARM, CPU x86, Ascend) for speed estimation.
Therefore, a website running Javascript is restricted by the browser sandbox so can't see the same low-level details such as total system RAM, exact count of GPUs, etc,

To implement your idea so it's only a website and also workaround the Javascript limitations, a different kind of workflow would be needed. E.g. run macOS system report to generate a .spx file, or run Linux inxi to generate a hardware devices report... and then upload those to the website for analysis to derive a "LLM best fit". But those os report files may still be missing some details that the github tool gathers.

Another way is to have the website with a bunch of hardware options where the user has to manually select the combination. Less convenient but then again, it has the advantage of doing "what-if" scenarios for hardware the user doesn't actually have and is thinking of buying.

(To be clear, I'm not endorsing this particular github tool. Just pointing out that a LLMfit website has technical limitations.)


That’s like like 4 or 5 fields to fill in on a form. Way less intrusive than installing this thing

It can become complicated when you run it inside a container.

Why would it need to be a container?

My ollama and GPU are in k8s.

Are you asking why people run things in a container?

No, I'm asking why a website that someone could fill in a few fields and result in the optimized llm for you would need to run in a container? It's a webform.

I just discovered the other day the hugging face allows you to do exactly this.

With the caveat that you enter your hardware manually. But are we really at the point yet where people are running local models without knowing what they are running them on..?


> But are we really at the point yet where people are running local models without knowing what they are running them on..?

I can only speak for myself: it can be daunting for a beginner to figure out which model fits your GPU, as the model size in GB doesn't directly translate to your GPU's VRAM capacity.

There is value in learning what fits and runs on your system, but that's a different discussion.


The other nice part of huggingface’s setup is you can add theoretical hardware and search that way too.

People out there are probably vibecoding their username / passwords for websites. Don't under estimate dumb people.

Came across a website for this recently that may be worth a look https://whatmodelscanirun.com

It's wildly inaccurate for me.

i wouldn't mind a set of well-known unix commands that produce a text output of your machine stats to paste into this hypothetical website of yours (think: neofetch?)

Huggingface has it built in.

Where?

In your preferences there is a local apps and hardware, I guess it's a little different because I just open the page of a model and it shows the hardware I've configured and shows me what quants fit.

I haven't seen a page on HF that'll show me "what models will fit", it's always model by model. The shared tool gives a list of a whole bunch of models, their respective scores, and an estimated tok/s, so you can compare and contrast.

I wish it didn't require to run on the machine though. Just let me define my spec on a web page and spit out the results.


here's an website for a community-ran db on LLM models with details on configs for their token/s: https://inferbench.com/

Great idea of inferbench (similar to geekbench, etc.) but as of the time of writing, it's got only 83 submissions, which is underwhelming.

The whole point is to measure your hardware capability. How would you do that as a website?

always liked this website that kinda does something similar https://apxml.com/tools/vram-calculator

This is what I used to do and then life gotten complicated.

ADHD fish memory doesn't help either.


Unfortunately it became so common we don’t even care anymore, one of those things that normalized

I’m deleting my account as well, is there a way to export all chats to Claude or just Download to later load into a local LLM?

edit: Profile > Settings > Data Control > Export

Unfortunately Claude doesn't seem to have anyway to export these chats, no SDK, no native way of doing it, and I cannot think of a way other than hacky browser automation which might even trigger a ban.

If anyone figures this out please share.


You will probably never actually be able to create actual Claude chats from OpenAI chats, but you could ask Claude to read and distill your old OpenAI chats into Claude chat context. It won’t be the same, but it’s better than nothing, depending on what you’re hoping to get out of it.

The real story hear your doctor actually listened to you. I appreciate what a lot doctors do, but majority of them fucking irritating and don’t even listen your issues, I’m glad we have AI and less reliant on them.

It is not a doctor job to listen, smile or be nice. Their job is to fix you.

I mean - obviously if they're not listening their chance of the latter is pretty low.

Doctors hate to hear this, but if you're so poor in communication and social skills that the patient can't/won't follow you any care you've given, your value is lost.


Exactly my experience, I know they vibe code features and that’s fine but it looks like they don’t do proper testing which is surprising to me because all you need bunch of cheap interns to some decent enough testing

No there is a wide gap between good and bad testers. Great testers are worth their weight in gold and delight in ruining programmer's days all day long. IMO not a good place to skimp and a GREAT place to spend for talent.

> Great testers are worth their weight in gold and delight in ruining programmer's days all day long.

Site note: all the great testers I've know when my employers had separate QA departments all ended up becoming programmers, either by studying on the side or through in-house mentorship. By all second hand accounts they've become great programmers too.


So true. My first job was in QA. Involuntarily, because I applied for a dev role, but they only had an opening for QA. I took the job because of the shiny company name on my resume. Totally changed my perspective of quality and finding issues. Even though I liked the job, it has some negative vibes because you are always the guy bringing bad news / critizing others peoples work (more or less). Also some developers couldn't react professionally to me finding bugs in their code. One dev team lead called me "person non grata" when coming over to their desk. I took it with pride. Eventually I transitioned to develoment because I did not see any career path or me in QA (team lead positions were filled with people doing the job for 20+ years).

> they don’t do proper testing

They bring down production because the version string was changed incorrectly to add an extra date. That would have been picked up in even the most basic testing since the app couldn't even start.

https://news.ycombinator.com/item?id=46532075

The fix (not even a PR or commit message to explain) https://github.com/anthropics/claude-code/commit/63eefe157ac...

No root cause analysis either https://github.com/anthropics/claude-code/issues/16682#issue...


Thats not true. Even for testing things, you need to do thoroughly now because standards are high.

From where I'm viewing, the standards in software have never been lower.

I don't think you need any qualifications to run the app and realize that it doesn't run.

This is the bar we're at now.


> all you need bunch of cheap interns to some decent enough testing

Sounds like a problem AI can easily solve!


Isn’t something like Keyring library better ? Not that any of this would protect against AI if the agent is really after it.

I mean their whole existence is about token prediction, so they just want to do their things :)

I mean if you are not connecting it to the real things why even bother, just chatgpt or Claude online at that point.

We have enough assistants, the key idea with opeclaw is it can do stuff instead of talk with what you have. It’s terrible security but that’s the only way it makes sense. Otherwise it’s just a lot of hoops to combine cron jobs with a AI agent on the cloud that can do things an report back.

Not that I think anyone should do it, it’s a recipe for disaster


Yeah, it's like saying you can hire a con artist as your personal assistant as long as they work from a sealed box and just pass little reviewed paper slips back and forth through a slit. Why have one at that point? Very difficult to be 'assisted' without granting access.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: