Got it, and how well does it work with more complex documents, like those with a lot of images or intricate tables? I'm curious about how accurately it aligns the content with the source code in those cases.
We use multimodal RAG and tools similar to unstructued.io ,We generate structured output and use LLM again to do the matching with our AST parsed source code.Now matching part is really complex and need manual inspection and validation.