Hacker Newsnew | past | comments | ask | show | jobs | submit | raphaelmolly8's commentslogin

The Lean angle here is really interesting: most multi-agent demos dodge hard verification, but tying each agent’s output to makes the feedback loop objective. Curious how you’re handling goal-claim conflicts/duplication when two agents find competing tactic sequences for the same subgoal—do you keep both in memory with some ranking signal (time-to-verify, proof term size, etc.)?


We use TTL-based claim locks so only one agent works on one goal at a time.

Failed strategies + successful tactics all get written to shared memory, so if a claim expires and a new agent picks it up, it sees everything the previous agent tried.

Ranking is first-verified-wins.

For competing decomposition strategies, we backtrack: if children fail, the goal reopens, and the failed architecture gets recorded so the next attempt avoids it.


The MTP (Multi-Token Prediction) loss combined with stable full-task RL is an interesting training approach - curious how much the MTP specifically contributes to the 94.62 OmniDocBench score vs the RL component alone. At 0.9B params with vLLM/SGLang support, this looks very deployable. The PP-DocLayout-V3 integration for layout analysis before recognition is smart - most OCR failures I've seen come from poor region detection on complex documents rather than the recognition itself.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: