I would also recommend checking out https://inkog.io as well, looks at similar patterns and you can run it directly in the browser and get results in 60s, it also builds an agent topology and check for "human in the loop"
Interesting that NIST is pushing for machine-readable behavioral declarations for agents. Basically an SBOM equivalent — agents declaring what tools they can access and what they can't do.
RFI responses due March 9, concept papers April 2. Moves fast for a federal initiative.
Honestly yeah – static catches structural stuff (missing exit conditions). But the trickier loops are when the model keeps deciding to retry. Like "let me try one more search" forever. That's prompt behavior, need runtime traces for those.
the point about this being an os problem not an ai problem resonates. letting untrusted agents drive your browser smells like a problem to me.
in practice we've had better luck running agents in lightweight sandboxes with explicit capability handles. curious if anyone's tried capability-based systems like sel4 for hosting agents, feels like mainstream oses have a long way to go here.
nice work. the idea of breaking agents into short-lived executors with explicit inputs/outputs makes a lot of sense - most failures i've seen come from agents staying alive too long and leaking assumptions across steps.
curious how you're handling context lifetimes when agents call other agents. do you drop context between calls or is there a way to bound it? that's been the trickiest part for us.
reply