There’s some blue text with an underline labeled “AV2 Specification”. That’s called a link. If you click that, you’ll see the date the spec was ratified and some details about it.
Are you suggesting that I read the whole spec and then read the whole AV1 spec and diff it in my head? Or are you referring to the only text in that link describing differences with AV1: "enhanced support for AR/VR applications, split-screen delivery of multiple programs, improved handling of screen content, and an ability to operate over a wider visual quality range"? This is not a description of technical features, it's a high level statement of aspirations. I'm asking what features they added to achieve these goals.
I was hoping someone familiar with AV2 might be frequenting this site alongside the much larger population of smartass pedants, and they might be able to summarize the new features in a way useful to me and others.
The fix appears to nicely asking the forgetful unreliable agent to please (very closely pretty please!) follow the deploy instructions (and also please never hallucinate or mess up, because statistics tells us an entity with no long term memory and no incentive to get everything right will do the job right 99.99999999% of the time, which is good enough to run an eshop) not deploy too often per hour.
With one simple instruction the system (99.9999% of the time) gains the handy property that “only” two processes end up with the database files open at once.
I have to work with agents as a part of my job and the very first thing I did when writing MCP tools for my workflow was to ensure they were read only or had a deterministic, hardcoded stopgap that evaluates the output.
I do not understand the level of carelessness and lack of thinking displayed in the OP.
Even just having the agent write scripts to disk and run those works wonders. It keeps the agent from having to rebuild a script for the same tasks, etc.
That too! Every time the agent does something I didn't intend, I end up making a tool or process guidance to prevent it from happening again. Not just add "don't do that" to the context.
There are some patterns here that everyone using AI should build in. Examples below are things I’m doing in my harness that enhances Claude Code (https://codeleash.dev)
1. Self-correction (human out of the loop) - give the AI opportunities to see mistakes it made and correct them. Think linting, but your agent wrote the linter weeks ago, it’s code, and its output is fed back with line numbers and recommendations for fixes. Maybe you have a specific architecture - why not have your agent write a script that walks your entire codebase’s AST and flags violations. If you guarantee that check gets run, you’ll never see a violation again because the agent will fix them before declaring itself done. Bye bye dumb AI mistakes.
2. Success criteria - a more subjective version of self correction. When the agent declares itself done, automatically run a process (agentic or code or both) which determines if the agent’s job is truly done. For example make your coding agent harness’s stop hook run the full test suite and feed back any failures. If the agent actually does get to finish work and your harness notifies you, you have certainty that your tests all pass. For extra credit you can have your harness inform the agent to review its own work using a self-review checklist. I’ve got this so dialed in, that the only time I interact with an agent is to approve its initial plan, then again to approve the plan it comes up with after its self review.
3. Self-reflection & continuous improvement. As the agent works, the harness should be generating logs of its work. Did it try to escape the test-driven-development state machine? Did it edit with shell commands instead of edit tools? Just before the agent is about to drop context (hits compaction or stops work), ask it to output details of anything it learned, as well as having it review its own work logs. These learnings can then be used to improve the harness, improve the self-review checklist, improve the docs (agent OR human docs), improve the automations, uncover process gaps, or even guide you to code refactorings so the agent doesn’t get surprises in future.
The result is you can trust the output. It’s all about putting a deterministic shell around the AI, by supervising a process while it does the work - because the surface area and complexity of a powerful process is so much lower than that of the work itself.
It’s just like management: set up guardrails, define the outcome, monitor the process, trust that it’ll lead to quality work, continuously improve everything at every opportunity.
The author calls it an ecosystem at one point. That’s overselling it.
I suspect “Copilot” is cargo culted naming across disparate parts of an org that’s home to upwards of 100,000 engineers who must all justify their latest bump in your subscription cost.
It’s amazing how much product Microsoft ships - that’s 95% of the thing.. unfortunately the last 5% is the product polish that’d make their stuff actually good. :(
It’ll be interesting to see if Apple comes around on customization of apps in general, because hopefully that’ll soon be what users expect.
In the world where users expect to be able to customize software more and more, apps start to look quite rigid and open platforms like the web that offer flexibility start to look more appealing.
Imagine a Lovable-style PWA that morphs into the app you vibecoded by storing the generated code in localStorage, for example - with cloud fallbacks to re-download the code if the storage is wiped.
Linux and Windows have always been a lot more customisable, Apple always was the more "we know better than you what you want" company... And they weren't wrong enough
I agree! Right now it is leveraging the Codex App Server, which is open-source and very well implemented, but using Claude Code Channels is probably a bit hacky.
The good thing is that it establishes a direct connection so it's already much better than having one agent spawn the other and wait for its output, or read/write to a shared .md file -- but it would be cool to make it work for all agent harnesses.
Those with fond memories of a childhood spent playing games and typing code from magazines and having low fidelity conversations with faraway likeminded folks need to know how lucky they were, because these days that stuff may still be there but for kids it won’t win pales vs addictive social media..
reply