More

cadamsdotcom · 2026-04-05T10:40:16 1775385616

Caveman need invent chalk and chart make argument backed by more than good feel.

cadamsdotcom · 2026-04-05T09:31:30 1775381490

It’s in the first sentence of the article.

Back of the class you go.

peterfirefly · 2026-04-05T20:12:44 1775419964

I don't think it actually is. Could you indulge me and quote it here, please?

cadamsdotcom · 2026-04-05T22:48:15 1775429295

There’s some blue text with an underline labeled “AV2 Specification”. That’s called a link. If you click that, you’ll see the date the spec was ratified and some details about it.

modeless · 2026-04-06T00:43:53 1775436233

Are you suggesting that I read the whole spec and then read the whole AV1 spec and diff it in my head? Or are you referring to the only text in that link describing differences with AV1: "enhanced support for AR/VR applications, split-screen delivery of multiple programs, improved handling of screen content, and an ability to operate over a wider visual quality range"? This is not a description of technical features, it's a high level statement of aspirations. I'm asking what features they added to achieve these goals.

I was hoping someone familiar with AV2 might be frequenting this site alongside the much larger population of smartass pedants, and they might be able to summarize the new features in a way useful to me and others.

cadamsdotcom · 2026-04-04T12:24:24 1775305464

The fix appears to nicely asking the forgetful unreliable agent to please (very closely pretty please!) follow the deploy instructions (and also please never hallucinate or mess up, because statistics tells us an entity with no long term memory and no incentive to get everything right will do the job right 99.99999999% of the time, which is good enough to run an eshop) not deploy too often per hour.

With one simple instruction the system (99.9999% of the time) gains the handy property that “only” two processes end up with the database files open at once.

Thanks for the vibes!

devmor · 2026-04-07T14:31:01 1775572261

I have to work with agents as a part of my job and the very first thing I did when writing MCP tools for my workflow was to ensure they were read only or had a deterministic, hardcoded stopgap that evaluates the output.

I do not understand the level of carelessness and lack of thinking displayed in the OP.

mywittyname · 2026-04-07T15:34:03 1775576043

Even just having the agent write scripts to disk and run those works wonders. It keeps the agent from having to rebuild a script for the same tasks, etc.

devmor · 2026-04-07T15:40:16 1775576416

That too! Every time the agent does something I didn't intend, I end up making a tool or process guidance to prevent it from happening again. Not just add "don't do that" to the context.

cadamsdotcom · 2026-04-04T01:27:41 1775266061

Isn’t this just a benchmark?

“Model can count to 5”… tick.

“Model can count to 10”… sorry you gotta wait til 2028.

cadamsdotcom · 2026-04-04T00:22:47 1775262167

There are some patterns here that everyone using AI should build in. Examples below are things I’m doing in my harness that enhances Claude Code (https://codeleash.dev)

1. Self-correction (human out of the loop) - give the AI opportunities to see mistakes it made and correct them. Think linting, but your agent wrote the linter weeks ago, it’s code, and its output is fed back with line numbers and recommendations for fixes. Maybe you have a specific architecture - why not have your agent write a script that walks your entire codebase’s AST and flags violations. If you guarantee that check gets run, you’ll never see a violation again because the agent will fix them before declaring itself done. Bye bye dumb AI mistakes.

2. Success criteria - a more subjective version of self correction. When the agent declares itself done, automatically run a process (agentic or code or both) which determines if the agent’s job is truly done. For example make your coding agent harness’s stop hook run the full test suite and feed back any failures. If the agent actually does get to finish work and your harness notifies you, you have certainty that your tests all pass. For extra credit you can have your harness inform the agent to review its own work using a self-review checklist. I’ve got this so dialed in, that the only time I interact with an agent is to approve its initial plan, then again to approve the plan it comes up with after its self review.

3. Self-reflection & continuous improvement. As the agent works, the harness should be generating logs of its work. Did it try to escape the test-driven-development state machine? Did it edit with shell commands instead of edit tools? Just before the agent is about to drop context (hits compaction or stops work), ask it to output details of anything it learned, as well as having it review its own work logs. These learnings can then be used to improve the harness, improve the self-review checklist, improve the docs (agent OR human docs), improve the automations, uncover process gaps, or even guide you to code refactorings so the agent doesn’t get surprises in future.

The result is you can trust the output. It’s all about putting a deterministic shell around the AI, by supervising a process while it does the work - because the surface area and complexity of a powerful process is so much lower than that of the work itself.

It’s just like management: set up guardrails, define the outcome, monitor the process, trust that it’ll lead to quality work, continuously improve everything at every opportunity.

cadamsdotcom · 2026-04-01T20:02:58 1775073778

The author calls it an ecosystem at one point. That’s overselling it.

I suspect “Copilot” is cargo culted naming across disparate parts of an org that’s home to upwards of 100,000 engineers who must all justify their latest bump in your subscription cost.

It’s amazing how much product Microsoft ships - that’s 95% of the thing.. unfortunately the last 5% is the product polish that’d make their stuff actually good. :(

cadamsdotcom · 2026-04-01T19:47:57 1775072877

Sorry but your comment is off topic and not in the spirit of discussing the article, hence my downvote.

I’m sure you’re otherwise a lovely human, but man.. you gotta move on from this.

ekelsen · 2026-04-02T01:20:33 1775092833

It's absolutely a discussion of the article. It's literally about how it was written.

cadamsdotcom · 2026-04-01T15:29:29 1775057369

It’ll be interesting to see if Apple comes around on customization of apps in general, because hopefully that’ll soon be what users expect.

In the world where users expect to be able to customize software more and more, apps start to look quite rigid and open platforms like the web that offer flexibility start to look more appealing.

Imagine a Lovable-style PWA that morphs into the app you vibecoded by storing the generated code in localStorage, for example - with cloud fallbacks to re-download the code if the storage is wiped.

namanyayg · 2026-04-01T16:25:48 1775060748

That's funny to read this today morning because that's exactly what i've been working on.

We helped a Series B YC company with a whitelabel Lovable app so all of their customers can build exactly what they need on top of their SaaS!

It really works -- 1200 customers are now vibe coding daily and using their SaaS a LOT more.

ddlsmurf · 2026-04-01T16:40:56 1775061656

Linux and Windows have always been a lot more customisable, Apple always was the more "we know better than you what you want" company... And they weren't wrong enough

lostlogin · 2026-04-01T16:37:21 1775061441

> open platforms like the web

I winced. The threats to the open web at the moment are depressing.

sheept · 2026-04-01T15:46:41 1775058401

It could probably store the code in the Cache API and serve it from a service worker so that it works offline and doesn't require evaling JavaScript

cadamsdotcom · 2026-03-27T03:56:06 1774583766

The vibes are great. But there’s a need for more science on this multi agent thing.

axldelafosse · 2026-03-27T04:44:35 1774586675

I agree! Right now it is leveraging the Codex App Server, which is open-source and very well implemented, but using Claude Code Channels is probably a bit hacky.

The good thing is that it establishes a direct connection so it's already much better than having one agent spawn the other and wait for its output, or read/write to a shared .md file -- but it would be cool to make it work for all agent harnesses.

Open to ideas! The repo is open-source.

SeriousM · 2026-03-27T05:57:28 1774591048

This one: https://github.com/openai/codex/tree/main/codex-rs/app-serve...

cadamsdotcom · 2026-03-25T10:18:02 1774433882

Those with fond memories of a childhood spent playing games and typing code from magazines and having low fidelity conversations with faraway likeminded folks need to know how lucky they were, because these days that stuff may still be there but for kids it won’t win pales vs addictive social media..