>As a rule, chatbots today have a propensity to confidently make stuff up, or, a...

robbrown451 · on Oct 6, 2023

You're right, but they can give it an incentive to communicate that, that should be pretty easy.

Right now it would be pretty easy to simply take ChatGPT output, feed it back in in a different thread (or even to a different model, such as Claude), and ask it which items in the response should be fact-checked, and also just to point out any that seem obviously wrong.

The former should be really easy to do, it doesn't have to know whether it's right or wrong -- it just has to know that it is a checkable fact. For instance the well known case of a lawyer citing a non-existent case from ChatGPT, it could say "this case should be fact checked to see that it is real and says what I said it said". Based on my experience with chatGPT (GPT-4 especially), this should be well within its current capabilities. (I'm going to try an experiment now.)

They could probably start having it do this behind the scenes, and check its own facts and learn from it so it learns when it is likely to hallucinate and learn to pick up on it. Even if for safety reasons it's not going out and hitting the web every time you're asking a question, it could be giving you a list at the end of the response of all the things within the response that you might want to check for yourself, maybe suggesting Google searches you should do.

SirMaster · on Oct 6, 2023

It just seems odd to me that it's not given an incentive to communicate this.

Surely humans using it would find great value in knowing the model's confidence or whether it thinks its confabulating or not.

These services are created to give the best product to users, and so wouldn't this be a better product? Therefore there is incentive. Happier users and a product that is better than competitors.

aidenn0 · on Oct 6, 2023

Go read through any mass of training data and count how often "I don't know" appears. It's going to be very small. Internet fora are probably the worst because people who are aware that they don't know usually refrain from posting.

famouswaffles · on Oct 6, 2023

>These services are created to give the best product to users, and so wouldn't this be a better product? Therefore there is incentive. Happier users and a product that is better than competitors.

Why would the computation care about any of that ? I'm talking about incentive for the model.

ooterness · on Oct 7, 2023

Incentive for the model is to survive RLHF feedback from contract workers who are paid to review LLM output all day. They're paid for quantity, not quality. Therefore, optimum strategy is to hallucinate some convincing lies.

SirMaster · on Oct 8, 2023

Why are they paid for quantity not quality though?

Sounds like it is a choice of the model creators then if they could instruct their testers to reward quality.

ooterness · on Oct 8, 2023

How would that work? Quantity is easy to measure. Quality is not.

SirMaster · on Oct 7, 2023

Doesn’t the model want to make the user happy?

Its responses sure seem like it does.

I’d be happier with its responses if it was honest about when it was not confident in its answer.

famouswaffles · on Oct 7, 2023

Go look at the first link I sent. Rewarding for "making users happy" destroys GPT-4's calibration.

Why would "making users happy" incentivize for truth ?

SirMaster · on Oct 8, 2023

Because getting truthful answers would make users happier?

Seems like common sense to me.

Who’s asking the chat bot questions not looking for or wanting a truthful answer a lot of the time?

If the model understood or captured “human interest” at all in its training this should be pretty fundamental to its behavior.

olddustytrail · on Oct 7, 2023

Yes, the computer wants you to be happy. Happiness is mandatory. Failure to be happy is treason.

hutzlibu · on Oct 6, 2023

"I'm talking about incentive for the model. "

In Douglas Adams Hitchhikers Guide to the Galaxy, this is (somewhat) fixed by giving the AIs emotion ..

RandomLensman · on Oct 6, 2023

I think that is working OK as long as token probability and correctness are related. If, in the extreme, there is something where all training data is wrong, not sure there is a good way to do this. Maybe I am misunderstanding, though.

It might also need to be able to distinguish between Knightian uncertainties and probabilities when there is nothing to base things on.

nonameiguess · on Oct 6, 2023

What it needs is a hierarchy of evidence. This works almost unreasonably well right now because I guess we're lucky that more digitized text than not is largely true, or RLHF is just that effective, but at some point, I would think the learner has to understand that reading a chemistry textbook and reading Reddit have equal weight when it comes to learning how to construct syntactically well-formed sentences with human-intelligble semantic content, but don't have equal weight with respect to factual accuracy.

ShamelessC · on Oct 6, 2023

I think if you consider the audience this was written for, and consider the various caveats each of your citations will involve; the "last bit" does indeed seem to be true if you are evaluating in good faith rather than trying to discredit the piece on a technicality that is still useful for the layman.

famouswaffles · on Oct 6, 2023

That's a pretty big technicality. The potential implications if right or wrong are entirely different. I also don't understand how simply pointing this out is "trying to discredit the piece". I was about as passive as i could and made no comment about the author or his/her intentions.

ShamelessC · on Oct 6, 2023

Fair enough - I should have said "attacking the premise of the article" or similar.