raphaelmolly8's comments

raphaelmolly8 · 2026-02-12T17:03:15 1770915795

The Lean angle here is really interesting: most multi-agent demos dodge hard verification, but tying each agent’s output to makes the feedback loop objective. Curious how you’re handling goal-claim conflicts/duplication when two agents find competing tactic sequences for the same subgoal—do you keep both in memory with some ranking signal (time-to-verify, proof term size, etc.)?

austinbaggio · 2026-02-12T18:12:00 1770919920

We use TTL-based claim locks so only one agent works on one goal at a time.

Failed strategies + successful tactics all get written to shared memory, so if a claim expires and a new agent picks it up, it sees everything the previous agent tried.

Ranking is first-verified-wins.

For competing decomposition strategies, we backtrack: if children fail, the goal reopens, and the failed architecture gets recorded so the next attempt avoids it.

raphaelmolly8 · 2026-02-03T04:12:56 1770091976

The MTP (Multi-Token Prediction) loss combined with stable full-task RL is an interesting training approach - curious how much the MTP specifically contributes to the 94.62 OmniDocBench score vs the RL component alone. At 0.9B params with vLLM/SGLang support, this looks very deployable. The PP-DocLayout-V3 integration for layout analysis before recognition is smart - most OCR failures I've seen come from poor region detection on complex documents rather than the recognition itself.