Hacker Newsnew | past | comments | ask | show | jobs | submit | kroaton's commentslogin

For autocomplete, Qwen 3.5 9B should be enough even at Q4_k_m. The upcoming coding/math Omnicoder-2 finetune might be useful (should be released in a few days).

Either that or just load up Qwen3.5-35B-A3B-Q4_K_S I'm serving it at about 40-50t/s on a 4070RTX Super 12GB + 64GB of RAM. The weights are 20.7GB + KV Cache (which should be lowered soon with the upcoming addition of TurboQuant).


I am definitely looking forward to TurboQuant. Makes me feel like my current setup is an investment that could pay over time. Imagine being able to run models like MiniMax M2.5 locally at Q4 levels. That would be swell.

I did the same a few months ago when I read that multiple big OSS Linux projects were moving to it and it's been phenomenal so far.

It could just as easily be a $3000-4000 Strix Halo laptop.


If SPTM is active on the chip, we are not going to be getting Linux at all.


Loving this; great work! Do you talk about the process anywhere in more depth?


Thanks! I'm using the KIRI Engine in Blender to render splats from my photos (https://github.com/Kiri-Innovation/3dgs-render-blender-addon) and then process the image as I would my photography in Lightroom. There are lots of different photogrammetry tools for generating plys (the point cloud) like PolyCam (https://poly.cam).


I remember thinking the same thing and this articles goes over most of the arguments here - https://milvus.io/blog/why-im-against-claude-codes-grep-only...

When it came out and I think it was Boris from Anthropic that said they experimented a lot with Vector Search and grep just worked better.

You can try it out using the Claude-Context MCP - https://github.com/zilliztech/claude-context


I built my own ai coding agent and do vector search and embeddings locally

https://slidebits.com/isogen


I hope the upcoming DeepSeek coding model puts a dent in Anthropic’s armor. Claude 4.5 is by far the best/fastest coding model, but the company is just too slimy and burning enough $$$ to guarantee enshitification in the near future.


I get way better results from Gemini fwiw.


Cerebras currently has GLM4.6 on it, and will be getting GLM4.7 soon.


Lol.


That's gone now. They do drops from time to time, but their compute platform is saturated.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: