One thing Cloudflare Workers gets right is strong execution isolation.
When self-hosting, what’s the failure model if user code misbehaves?
Is there any runtime-level guardrail or tracing for side-effects?
Asking because execution is usually where things go sideways.
Workers that hit limits (CPU, memory, wall-clock) get terminated cleanly with a clear reason. Exceptions are caught with stack traces (at least it should lol), logs stream in real-time.
What's next: execution recording. Every invocation captures a trace: request, binding calls, timing. Replay locally or hand it to an AI debugger. No more "works on my machine".
This makes a lot of sense. Recording execution + replay is exactly what’s missing once you move past simple logging.
One thing I’ve found tricky in similar setups is making sure the trace is captured before side-effects happen, otherwise replay can lie to you. If you get that boundary right, the prod → replay → fix → verify loop becomes much more reliable.
How do you handle execution-time guarantees?
For example: when an MCP tool call touches the filesystem or network,
do you validate + log the side-effects before execution?
I’ve seen audits fail not at planning, but at the exact tool-call boundary.
One thing I’ve been bitten by with desktop agents is execution-time safety:
the plan is correct, but a single malformed path or OS call causes real damage.
Do you enforce any guardrails at the tool boundary
(e.g. path sandboxing, network allowlists, dry-run / replay)?
Phenomenal questions. Sandboxing would be a PHENOMENAL idea. And allowlist it currently is capable of this but does require code changes so configuration based would probably be more what you are referring to?
The replay feature is similar to the record feature. It's not a "guardrail" I would say though.
I mostly worry about the gap between a correct plan and execution-time behavior — especially when tools touch the filesystem or OS APIs. Even a single malformed argument can have irreversible effects.
Totally agree these guardrails are non-trivial, but it’s great to see the project thinking in this direction.
FailCore is intentionally not an agent framework, planner, or sandbox.
It sits strictly at the execution boundary and focuses on two things:
1) blocking unsafe side effects before they happen
2) recording enough execution trace to replay or audit failures later
The goal isn’t to make agents smarter, but to make their failures
observable, reproducible, and boring.
If people are curious, the DESIGN.md goes deeper into why this is done
at the Python runtime level instead of kernel-level isolation (eBPF, VMs, etc.),
and what trade-offs that implies.
My motivation was to give PR/MR reviewers a very low-friction way to see a Helm chart change running.
The workflow is intentionally simple: install a GitHub App (or call a REST API in other workflows), open a PR/MR, and you get a live preview. That’s it.
There’s no ArgoCD setup, no Helmfile, no cluster provisioning, no DNS wiring to build or maintain. The goal was to make it trivial for reviewers to see “this PR running” — especially for public Helm charts where contributors and reviewers can’t realistically be expected to set up infrastructure just to demo a change.
If you already run ephemeral previews via ArgoCD or Helmfile, this probably isn’t adding much value. Those approaches work well once they’re in place. Chart Preview is aimed at the cases where teams want PR previews without having to design, build, and maintain that machinery themselves.
That makes sense — thanks for clarifying. Framing it as “zero infra ownership, just a reviewer convenience” really helps explain where this fits compared to ArgoCD-style previews.
If you use `culsans.Queue().async_q` as a direct replacement for `asyncio.Queue()`, then there is essentially no difference. The difference becomes apparent when you use additional features:
1. If checkpoints are enabled (by default when using Trio, or if you explicitly apply `aiologic.lowlevel.enable_checkpoints()`), then every call that is not explicitly non-blocking can be cancelled (even if no waiting is required). For comparison, `await queue.put(await queue.get())` for `queue = asyncio.Queue()` in an infinite loop will never yield back to the event loop (when 0 < size < maxsize is true), and as a result, no other asyncio tasks will ever continue their execution, and such a loop cannot be cancelled (see PEP 492).
2. With multithreading and corresponding race conditions, method calls are synchronized using the underlying lock (as in `queue.Queue`). This means that such synchronization can temporarily block the event loop, but this is rarely a bottleneck (the same is used in Janus). In general, this delays task cancellation and timeout handling if someone else is still holding the lock. If you need extremely fast and scalable queues, `aiologic.SimpleQueue` may be the best option (it does not use any form of internal state synchronization!).
I am not sure I understand your question well enough. `asyncio.Queue` works exclusively in cooperative multitasking (it is not thread-safe) with all the resulting simplifications. The principle of operation of Culsans queues under the same conditions is almost the same as that of any other queues capable of operating as purely asynchronous with cancellation support (perhaps you are referring to starting new threads or new tasks as an implementation detail? aiologic and Culsans do not use any of this). As soon as preemptive multitasking is introduced, the behavior may change somewhat - `culsans.Queue` relies on sync-only synchronization of the internal state, `aiologic.Queue` on async-aware synchronization (without blocking the event loop; still used because `heapq` functions are not thread-safe, and they are required for priority queues; but the wait queues are combined, which achieves fairness and solves python/cpython#90968), and `aiologic.SimpleQueue` does not synchronize the internal state at all due to the use of effectively atomic operations.
I would like to add that you can also read about some non-trivial details in the "Performance" section of the aiologic documentation [4]. What is described there for standard primitives also applies to Culsans queues (specifically, the mutex case; however, other documentation sections (such as "Why?", "Overview", and "Libraries") are also relevant to Culsans, since aiologic is used under the hood).
Thanks, that clarifies it. The checkpoint-based cancellation and the sync-vs-async locking model differences were exactly what I was trying to understand.
One thing Cloudflare Workers gets right is strong execution isolation. When self-hosting, what’s the failure model if user code misbehaves? Is there any runtime-level guardrail or tracing for side-effects?
Asking because execution is usually where things go sideways.