I'm not sure it's all that easy though. We don't entirely know how LLMs do some of the things they do, and we can't interpret what's in them. They don't internally look up particular sources, it's just a big mess of connection weights.
Maybe my simple scheme would be all it needs. Or maybe it needs some new breakthrough and right now nobody knows how to do it. I was hoping some resident expert could let me know.
First, it seems possible that if sources were in the training data like I described, then understanding of sources could be an emergent capability, just because the LLM reads "the source of the following is X."
Second, maybe a trainer LLM could be tasked with reading the trainee's answers and any sources it provides, and judging whether the source is correct.
Well, you can train the LLM to "provide source", but LLMs are prone to hallucination. You run into the same problem with a trainer model; the trainer also has no way to confirm where the model actually got the answer from. One thing that may work is a fundamental architectural shift where the LLM looks up all its info as it needs it, and then you can just list the sources it actually used. Microsoft tried that with Bing, but it turns out the model will search for a website, read it, and then ignore what the website says and claim "according to this website, <some other belief>". So it's definitely not easy at any rate.
Sure but I don't think it's necessary to confirm where the AI got something. There might be lots of sources. I often tell someone some fact I know, mention a source if I remember it, and possibly google a link. An AI could do much the same. In training, the trainer could just check whether the claimed sources actually say something similar to what the trainee claimed it said.
In operation, the "trainer" could do the same thing in the background. And then of course, human users could also check up on the sources if they need to be sure of catching hallucinations.