I was testing the 4-bit Qwen3 Coder Next on my 395+ board last night. IIRC it was maintaining around 30 tokens a second even with a large context window.
I haven't tried Minimax M2.5 yet. How do its capabilities compare to Qwen3 Coder Next in your testing?
I'm working on getting a good agentic coding workflow going with OpenCode and I had some issues with the Qwen model getting stuck in a tool calling loop.
Minimax passed this test, which even some SOTA models don't pass. But I haven't tried any agentic coding yet.
I wasn't able to allocate the full context length for Minimax with my current setup, I'm going to try quantizing the KV cache to see if I can fit the full context length into the RAM I've allocated to the GPU. Even at a 3 bit quant MiniMax is pretty heavy. Need to find a big enough context window, otherwise it'll be less useful for agentic coding. With Qwen3 Coder Next, I can use the full context window.
Yeah, I've also seen the occasional tool call looping in Qwen3 Coder Next, that seems to be an easy failure mode for that model to hit.
OK, with MiniMax M2.5 UD-Q3_K_XL (101 GiB), I can't really seem to fit the full context in even at smaller quants. Going up much above 64k tokens, I start to get OOM errors when running Firefox and Zed alongside the model, or just failure to allocate the buffers, even going down to 4 bit KV cache quants (oddly, 8 bit worked better than 4 or 5 bit, but I still ran into OOM errors).
I might be able to squeeze a bit more out if I were running fully headless with my development on another machine, but I'm running everything on a single laptop.
So looks like for my setup, 64k context with an 8 bit quant is about as good as I can do, and I need to drop down to a smaller model like Qwen3 Coder Next or GPT-OSS 120B if I want to be able to use longer contexts.
After some more testing, yikes, MiniMax M2.5 can get painfully slow on this setup.
Haven't tried different things like switching between Vulkan and ROCm yet.
But anyhow, that 17 tokens per second was on almost empty context. By the time I got to 30k tokens context or so, it was down in the 5-10 tokens per second, and even occasionally all the way down to 2 tokens per second.
Oh, and it looks like I'm filling up the KV cache sometimes, which is causing it to have to drop the cache and start over fresh. Yikes, that is why it's getting so slow.
Qwen3 Coder Next is much faster. MiniMax's thinking/planning seems stronger, but Qwen3 Coder Next is pretty good at just cranking through a bunch of tool calls and poking around through code and docs and just doing stuff. Also MiniMax seems to have gotten confused by a few things browsing around the project that I'm in that Qwen3 Coder Next picked up on, so it's not like it's universally stronger.
Isn't it just the usual feedback loop that happens with popular podcasters? They have connections and get a few highly popular guests on. As long as their demeanor is agreeable and they keep the conversation interesting other high profile guests will agree to be on and thus they've created a successful show.
I guess it was just a poetic riff on Tinder for AI agents. It seems like one of the more profound questions around AI and the singularity. One AI gaining sentience would be a big deal, for sure, but two self-aware AIs that could produce an offspring — that would be quite something.
The knee-jerk reaction reaction to Moltbook is almost certainly "what a waste of compute" or "a security disaster waiting to happen". Both of those thoughts have merit and are worth considering, but we must acknowledge that something deeply fascinating is happening here. These agents are showing the early signs of swarm intelligence. They're communicating, learning, and building systems and tools together. To me, that's mind blowing and not at all something I would have expected to happen this year.
> These agents are showing the early signs of swarm intelligence.
Ehhh... it's not that impressive is it? I think it's worth remembering that you can get extremely complex behaviour out of conways game of life [0] which is as much of a swarm as this is, just with an unfathomably huge difference in the number of states any one part can be in. Any random smattering of cells in GoL is going to create a few gliders despite that difference in complexity.
The browser works shockingly well considering it was created in 72 hours. It can render Wikipedia well enough to read and browse articles. With some basic form handling and browser standards (url bar, history, bookmarks, etc) it would be a viable way to consume text based content.
I can't say my fingers (codex's fingers) haven't been itching to add some small features which would basically make it a viable browser for myself at least, for 90% of my browsing.
But I think this is one of those experiments that I need to put a halt to sooner rather than later, because the scope can always grow, my mind really likes those sorts of projects, and I don't have the time for that right now :)
It would be really cool if it was able to render Wikipedia correctly, I really like the idea of a browser with minimal dependencies having the ability to navigate most static websites, this one for now compiles instantly and it's incredibly small.
Yeah, my mind battled with what websites to use as examples for adding support, Wikipedia should have been an obvious one, that's on me!
You're not the only one to say this, maybe there is a value in a minimal HTML+CSS browser that still works with the modern (non-JS using) web, although I'm not sure how much.
Another idea I had, was to pile another experiment on top of this one, more about "N humans + N agents = one browser", in a collaborative fashion, lets see if that ends up happening :)
Maybe you can divide the task into verifiable environments like an HTML5 parser environment where an agent is going to build the parser and also check the progress against a test suites (the https://github.com/html5lib/html5lib-tests in this case) and then write the API into a .md, the job of the human is going to be at the beginning to create the various environments where the agents are going to build the components from (and also how much it can be divided into standalone components).
Thanks for the ideas, but I'll leave the torch for someone to pickup, the goal was to get as far as possible within 3 days, and with human steering, so I'm stopping here personally :)
I'll keep them in mind for the future, who knows, maybe some interesting iteration could be done on what's been made so far.
> The security hazards of artisanal hosting are more real than ever
How could this possibly be true? It's not at all rocket science to create a static blog and serve it via a production grade web server (nginx, etc).
> The UX of DNS hasn't improved (it's still impossible for normies)
The UX of DNS sucks but we're talking about a single A record. Is that not within reach of a normie in the age of AI?
> Custom domains are not just vain, they're ephemeral. Certainly more so than, say, the domain of a blogging platform that's managed by a non-profit.
I can't think of a single free blogging platform that has stood up to the test of time. Depending on centralized resources, particularly when you're not paying for them, is the recipe for ephemerality. If you're going to pay for it why can't you afford a domain?
What you’re saying makes no sense. Unless I'm misunderstanding it (and I am in fact a lawyer), the data indicates that 53% of the people have either been convicted of or charged with a crime.
Yes, people who are charged are innocent until proven guilty. Nobody is implying any of those charged people are guilty.
Fifty-three is a high percentage of any population to face criminal charges, innocent or guilty. That’s it.
What do you think is the delta between "charged" and "pending charge"?
What do you think the delta is between "general population", "pending charge", "charged", and "convicted" - while we're at it?
You can make a vast majority of the population "pending charge" at some point if you take into consideration they were - at one point - marked "pending bench warrant" during the time between getting a traffic ticket and (A) paying it; or (B) fighting it.
Let's see how far we can skew the datasets, with enough motivation. And the "benefit of the doubt" that you'd reflexively give someone on not stretching the truth simply CANNOT be afforded to these people.
No this isn’t correct. A bench warrant is not a charge. Neither is an arrest warrant, but in any event most traffic violations will be civil and not criminal matters.
> You can make a vast majority of the population "pending charge" at some point if you take into consideration they were - at one point - marked "pending bench warrant" during the time between getting a traffic ticket and (A) paying it; or (B) fighting it.
This statement is incorrect. A "charge" means charged with a crime. You cannot make anyone "pending charge" if you issue them a bench warrant. Nor an arrest warrant.
In theory NPUs are a cheap, efficient alternative to the GPU for getting good speeds out of larger neural nets. In practice they're rarely used because for simple tasks like blurring, speech to text, noise cancellation, etc you can get usually do it on the CPU just fine. For power users doing really hefty stuff they usually have a GPU anyway so that gets used because it's typically much faster. That's exactly what happens with my AMD AI Max 395+ board. I thought maybe the GPU and NPU could work in parallel but memory limitations mean that's often slower than just using the GPU alone. I think I read that their intended use case for the NPU is background tasks when the GPU is already loaded but that seems like a very niche use case.
If the NPU happens to use less power for any given amount of TOPS it's still a win since compute-heavy workloads are ultimately limited by power and thermals most often, especially on mobile hardware. That frees up headroom for the iGPU. You're right about memory limitations, but these are generally relevant for e.g. token generation not prefill.
I haven't tried Minimax M2.5 yet. How do its capabilities compare to Qwen3 Coder Next in your testing?
I'm working on getting a good agentic coding workflow going with OpenCode and I had some issues with the Qwen model getting stuck in a tool calling loop.
reply