>As a rule, chatbots today have a propensity to confidently make stuff up, or, as some researchers say, “hallucinate.” At the root of these hallucinations is an inability to introspect: the A.I. doesn’t know what it does and doesn’t know.
The last bit doesn't seem to be true. There's quite a lot of indication that the computation can distinguishing hallucinations. It just has no incentive to communicate this.
Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback - https://arxiv.org/abs/2305.14975
Also even if we're strictly talking about text, there is still a ton of data left to train on. We've just barely reached what is easily scrapable online and are nowhere near a real limit yet. And of course, you can just train more than one epoch. That said, it's very clear quality data is far more helpful than sheer quantity and sheer quantity is more likely than not to derail progress.
You're right, but they can give it an incentive to communicate that, that should be pretty easy.
Right now it would be pretty easy to simply take ChatGPT output, feed it back in in a different thread (or even to a different model, such as Claude), and ask it which items in the response should be fact-checked, and also just to point out any that seem obviously wrong.
The former should be really easy to do, it doesn't have to know whether it's right or wrong -- it just has to know that it is a checkable fact. For instance the well known case of a lawyer citing a non-existent case from ChatGPT, it could say "this case should be fact checked to see that it is real and says what I said it said". Based on my experience with chatGPT (GPT-4 especially), this should be well within its current capabilities. (I'm going to try an experiment now.)
They could probably start having it do this behind the scenes, and check its own facts and learn from it so it learns when it is likely to hallucinate and learn to pick up on it. Even if for safety reasons it's not going out and hitting the web every time you're asking a question, it could be giving you a list at the end of the response of all the things within the response that you might want to check for yourself, maybe suggesting Google searches you should do.
It just seems odd to me that it's not given an incentive to communicate this.
Surely humans using it would find great value in knowing the model's confidence or whether it thinks its confabulating or not.
These services are created to give the best product to users, and so wouldn't this be a better product? Therefore there is incentive. Happier users and a product that is better than competitors.
Go read through any mass of training data and count how often "I don't know" appears. It's going to be very small. Internet fora are probably the worst because people who are aware that they don't know usually refrain from posting.
>These services are created to give the best product to users, and so wouldn't this be a better product? Therefore there is incentive. Happier users and a product that is better than competitors.
Why would the computation care about any of that ? I'm talking about incentive for the model.
Incentive for the model is to survive RLHF feedback from contract workers who are paid to review LLM output all day. They're paid for quantity, not quality. Therefore, optimum strategy is to hallucinate some convincing lies.
I think that is working OK as long as token probability and correctness are related. If, in the extreme, there is something where all training data is wrong, not sure there is a good way to do this. Maybe I am misunderstanding, though.
It might also need to be able to distinguish between Knightian uncertainties and probabilities when there is nothing to base things on.
What it needs is a hierarchy of evidence. This works almost unreasonably well right now because I guess we're lucky that more digitized text than not is largely true, or RLHF is just that effective, but at some point, I would think the learner has to understand that reading a chemistry textbook and reading Reddit have equal weight when it comes to learning how to construct syntactically well-formed sentences with human-intelligble semantic content, but don't have equal weight with respect to factual accuracy.
I think if you consider the audience this was written for, and consider the various caveats each of your citations will involve; the "last bit" does indeed seem to be true if you are evaluating in good faith rather than trying to discredit the piece on a technicality that is still useful for the layman.
That's a pretty big technicality. The potential implications if right or wrong are entirely different. I also don't understand how simply pointing this out is "trying to discredit the piece". I was about as passive as i could and made no comment about the author or his/her intentions.
The last bit doesn't seem to be true. There's quite a lot of indication that the computation can distinguishing hallucinations. It just has no incentive to communicate this.
GPT-4 logits calibration pre RLHF - https://imgur.com/a/3gYel9r
Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback - https://arxiv.org/abs/2305.14975
Teaching Models to Express Their Uncertainty in Words - https://arxiv.org/abs/2205.14334
Language Models (Mostly) Know What They Know - https://arxiv.org/abs/2207.05221
Also even if we're strictly talking about text, there is still a ton of data left to train on. We've just barely reached what is easily scrapable online and are nowhere near a real limit yet. And of course, you can just train more than one epoch. That said, it's very clear quality data is far more helpful than sheer quantity and sheer quantity is more likely than not to derail progress.