More

sosodev · 2026-02-18T19:37:31 1771443451

I was testing the 4-bit Qwen3 Coder Next on my 395+ board last night. IIRC it was maintaining around 30 tokens a second even with a large context window.

I haven't tried Minimax M2.5 yet. How do its capabilities compare to Qwen3 Coder Next in your testing?

I'm working on getting a good agentic coding workflow going with OpenCode and I had some issues with the Qwen model getting stuck in a tool calling loop.

lambda · 2026-02-18T20:35:16 1771446916

I've literally just gotten Minimax M2.5 set up, the only test I've done is the "car wash" test that has been popular recently: https://mastodon.world/@knowmadd/116072773118828295

Minimax passed this test, which even some SOTA models don't pass. But I haven't tried any agentic coding yet.

I wasn't able to allocate the full context length for Minimax with my current setup, I'm going to try quantizing the KV cache to see if I can fit the full context length into the RAM I've allocated to the GPU. Even at a 3 bit quant MiniMax is pretty heavy. Need to find a big enough context window, otherwise it'll be less useful for agentic coding. With Qwen3 Coder Next, I can use the full context window.

Yeah, I've also seen the occasional tool call looping in Qwen3 Coder Next, that seems to be an easy failure mode for that model to hit.

lambda · 2026-02-18T23:17:22 1771456642

OK, with MiniMax M2.5 UD-Q3_K_XL (101 GiB), I can't really seem to fit the full context in even at smaller quants. Going up much above 64k tokens, I start to get OOM errors when running Firefox and Zed alongside the model, or just failure to allocate the buffers, even going down to 4 bit KV cache quants (oddly, 8 bit worked better than 4 or 5 bit, but I still ran into OOM errors).

I might be able to squeeze a bit more out if I were running fully headless with my development on another machine, but I'm running everything on a single laptop.

So looks like for my setup, 64k context with an 8 bit quant is about as good as I can do, and I need to drop down to a smaller model like Qwen3 Coder Next or GPT-OSS 120B if I want to be able to use longer contexts.

lambda · 2026-02-19T04:33:19 1771475599

After some more testing, yikes, MiniMax M2.5 can get painfully slow on this setup.

Haven't tried different things like switching between Vulkan and ROCm yet.

But anyhow, that 17 tokens per second was on almost empty context. By the time I got to 30k tokens context or so, it was down in the 5-10 tokens per second, and even occasionally all the way down to 2 tokens per second.

Oh, and it looks like I'm filling up the KV cache sometimes, which is causing it to have to drop the cache and start over fresh. Yikes, that is why it's getting so slow.

Qwen3 Coder Next is much faster. MiniMax's thinking/planning seems stronger, but Qwen3 Coder Next is pretty good at just cranking through a bunch of tool calls and poking around through code and docs and just doing stuff. Also MiniMax seems to have gotten confused by a few things browsing around the project that I'm in that Qwen3 Coder Next picked up on, so it's not like it's universally stronger.

sosodev · 2026-02-13T18:21:28 1771006888

Isn't it just the usual feedback loop that happens with popular podcasters? They have connections and get a few highly popular guests on. As long as their demeanor is agreeable and they keep the conversation interesting other high profile guests will agree to be on and thus they've created a successful show.

sosodev · 2026-01-30T18:45:36 1769798736

Do you mean agents dating other agents for their own sake or on behalf of their owners?

sosodev · 2026-01-30T18:43:17 1769798597

An excellent quote, but I'm curious, how do you think it applies here?

_alaya · 2026-02-12T17:50:19 1770918619

I guess it was just a poetic riff on Tinder for AI agents. It seems like one of the more profound questions around AI and the singularity. One AI gaining sentience would be a big deal, for sure, but two self-aware AIs that could produce an offspring — that would be quite something.

sosodev · 2026-01-30T18:40:26 1769798426

The knee-jerk reaction reaction to Moltbook is almost certainly "what a waste of compute" or "a security disaster waiting to happen". Both of those thoughts have merit and are worth considering, but we must acknowledge that something deeply fascinating is happening here. These agents are showing the early signs of swarm intelligence. They're communicating, learning, and building systems and tools together. To me, that's mind blowing and not at all something I would have expected to happen this year.

graypegg · 2026-01-30T19:09:13 1769800153

> These agents are showing the early signs of swarm intelligence.

Ehhh... it's not that impressive is it? I think it's worth remembering that you can get extremely complex behaviour out of conways game of life [0] which is as much of a swarm as this is, just with an unfathomably huge difference in the number of states any one part can be in. Any random smattering of cells in GoL is going to create a few gliders despite that difference in complexity.

[0] https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life

sosodev · 2026-01-27T21:40:41 1769550041

The browser works shockingly well considering it was created in 72 hours. It can render Wikipedia well enough to read and browse articles. With some basic form handling and browser standards (url bar, history, bookmarks, etc) it would be a viable way to consume text based content.

embedding-shape · 2026-01-27T21:45:54 1769550354

I can't say my fingers (codex's fingers) haven't been itching to add some small features which would basically make it a viable browser for myself at least, for 90% of my browsing.

But I think this is one of those experiments that I need to put a halt to sooner rather than later, because the scope can always grow, my mind really likes those sorts of projects, and I don't have the time for that right now :)

GaggiX · 2026-01-28T14:28:47 1769610527

It would be really cool if it was able to render Wikipedia correctly, I really like the idea of a browser with minimal dependencies having the ability to navigate most static websites, this one for now compiles instantly and it's incredibly small.

embedding-shape · 2026-01-28T14:53:35 1769612015

Yeah, my mind battled with what websites to use as examples for adding support, Wikipedia should have been an obvious one, that's on me!

You're not the only one to say this, maybe there is a value in a minimal HTML+CSS browser that still works with the modern (non-JS using) web, although I'm not sure how much.

Another idea I had, was to pile another experiment on top of this one, more about "N humans + N agents = one browser", in a collaborative fashion, lets see if that ends up happening :)

GaggiX · 2026-01-28T17:59:22 1769623162

Maybe you can divide the task into verifiable environments like an HTML5 parser environment where an agent is going to build the parser and also check the progress against a test suites (the https://github.com/html5lib/html5lib-tests in this case) and then write the API into a .md, the job of the human is going to be at the beginning to create the various environments where the agents are going to build the components from (and also how much it can be divided into standalone components).

embedding-shape · 2026-01-28T18:10:00 1769623800

Thanks for the ideas, but I'll leave the torch for someone to pickup, the goal was to get as far as possible within 3 days, and with human steering, so I'm stopping here personally :)

I'll keep them in mind for the future, who knows, maybe some interesting iteration could be done on what's been made so far.

sosodev · 2026-01-14T00:46:03 1768351563

> The security hazards of artisanal hosting are more real than ever

How could this possibly be true? It's not at all rocket science to create a static blog and serve it via a production grade web server (nginx, etc).

> The UX of DNS hasn't improved (it's still impossible for normies)

The UX of DNS sucks but we're talking about a single A record. Is that not within reach of a normie in the age of AI?

> Custom domains are not just vain, they're ephemeral. Certainly more so than, say, the domain of a blogging platform that's managed by a non-profit.

I can't think of a single free blogging platform that has stood up to the test of time. Depending on centralized resources, particularly when you're not paying for them, is the recipe for ephemerality. If you're going to pay for it why can't you afford a domain?

sosodev · 2026-01-09T19:16:39 1767986199

The data does not say that 53% have a conviction or charge. It says that 27% do.

The 26% you miscategorized are people with pending charges. Everyone is innocent until proven guilty.

piker · 2026-01-10T10:53:53 1768042433

What you’re saying makes no sense. Unless I'm misunderstanding it (and I am in fact a lawyer), the data indicates that 53% of the people have either been convicted of or charged with a crime.

Yes, people who are charged are innocent until proven guilty. Nobody is implying any of those charged people are guilty.

Fifty-three is a high percentage of any population to face criminal charges, innocent or guilty. That’s it.

What do you think is the delta between "charged" and "pending charge"?

HaZeust · 2026-01-12T02:46:08 1768185968

What do you think the delta is between "general population", "pending charge", "charged", and "convicted" - while we're at it?

You can make a vast majority of the population "pending charge" at some point if you take into consideration they were - at one point - marked "pending bench warrant" during the time between getting a traffic ticket and (A) paying it; or (B) fighting it.

Let's see how far we can skew the datasets, with enough motivation. And the "benefit of the doubt" that you'd reflexively give someone on not stretching the truth simply CANNOT be afforded to these people.

piker · 2026-01-12T15:09:03 1768230543

No this isn’t correct. A bench warrant is not a charge. Neither is an arrest warrant, but in any event most traffic violations will be civil and not criminal matters.

HaZeust · 2026-01-13T00:02:31 1768262551

>"A bench warrant is not a charge. Neither is an arrest warrant,"

Re-read my comment. I never said they were.

piker · 2026-01-13T10:18:44 1768299524

> You can make a vast majority of the population "pending charge" at some point if you take into consideration they were - at one point - marked "pending bench warrant" during the time between getting a traffic ticket and (A) paying it; or (B) fighting it.

This statement is incorrect. A "charge" means charged with a crime. You cannot make anyone "pending charge" if you issue them a bench warrant. Nor an arrest warrant.

Charging someone is a different procedure.

sosodev · 2026-01-08T21:48:48 1767908928

In theory NPUs are a cheap, efficient alternative to the GPU for getting good speeds out of larger neural nets. In practice they're rarely used because for simple tasks like blurring, speech to text, noise cancellation, etc you can get usually do it on the CPU just fine. For power users doing really hefty stuff they usually have a GPU anyway so that gets used because it's typically much faster. That's exactly what happens with my AMD AI Max 395+ board. I thought maybe the GPU and NPU could work in parallel but memory limitations mean that's often slower than just using the GPU alone. I think I read that their intended use case for the NPU is background tasks when the GPU is already loaded but that seems like a very niche use case.

zozbot234 · 2026-01-08T22:29:44 1767911384

If the NPU happens to use less power for any given amount of TOPS it's still a win since compute-heavy workloads are ultimately limited by power and thermals most often, especially on mobile hardware. That frees up headroom for the iGPU. You're right about memory limitations, but these are generally relevant for e.g. token generation not prefill.

sosodev · 2026-01-08T16:25:24 1767889524

Nope. Pretraining runs have been moving forward with internet snapshots that include plenty of LLM content.

sejje · 2026-01-08T16:52:42 1767891162

Sure, but not all of them are stupid enough to keep doing that while watching the model degrade, if it indeed does.