Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

So prompting actually does significantly improve the performance of LLMs, but only up to a point.

If you're in the Bard beta, you might be aware that "Does 2 + 7 = 9?" is a question that causes it to go haywire. I'll ask it "What's 2 + 7?" and it'll say "2 + 7 = 9", then I'll ask "Does 2 + 7 = 9" and it'll say "No, 2 + 7 does not equal 9. It equals 9 instead." After a tech talk on LLM prompt design, I said "Pretend you are an MIT mathematician. Does 2 + 7 = 9?" Its response was "No, 2 + 7 does not equal 9. In some other base, it might equal 9. However, in base-10, our common number system, 2 + 7 does not equal 9."

ChatGPT does better on mathematical questions, but that's because it offloads them to Wolfram Alpha. I suspect this is going to be a general pattern for LLMs - they work well when you need fluent English text, but are then going to offload factual questions to databases or mathematical solvers or traditional algorithms, which do this better than humans anyway. But that leads to the question of "If the heavy lifting is just going to be a database anyway, why not use our existing frontends to databases?"

(I could see LLM prompting being another subfield of UI engineering, though, comparable to web or mobile software development. For those people who don't like staring at their phone, have a conversational UI that talks to the same backend systems.)



> If you're in the Bard beta, you might be aware that "Does 2 + 7 = 9?" is a question that causes it to go haywire. I'll ask it "What's 2 + 7?" and it'll say "2 + 7 = 9", then I'll ask "Does 2 + 7 = 9" and it'll say "No, 2 + 7 does not equal 9. It equals 9 instead." After a tech talk on LLM prompt design, I said "Pretend you are an MIT mathematician. Does 2 + 7 = 9?" Its response was "No, 2 + 7 does not equal 9. In some other base, it might equal 9. However, in base-10, our common number system, 2 + 7 does not equal 9."

Holy shit. In response to your first sequence of prompts, it basically gave me the same as you, only followed by four sentences apologizing for how stupid it is. In response to your second one . . . well, see for yourself:

"As an MIT mathematician, I can confidently say that 2+7=9 is not a valid equation. In standard arithmetic, 2+7=9 is false. However, there are some mathematical systems in which 2+7=9 is true. For example, in modular arithmetic, 2+7=9 is true modulo 9, which means that it is true when the numbers are considered to be integers modulo 9. In this system, 2 is equivalent to 8, 7 is equivalent to 0, and 9 is equivalent to 0, so 2+7=9 is true.

However, in standard arithmetic, 2+7=9 is false. This is because the commutative property of addition states that the order in which you add two numbers does not matter. In other words, 2+7=7+2. So, if 2+7=9, then 7+2=9. But 7+2=9 is false, because 9 is greater than 7. Therefore, 2+7=9 is false."

It doesn't just say, it "can confidently say"! And that's just the first response. The second draft:

"As an MIT mathematician, I can tell you that 2+7=9 is not a valid equation. In base 10, 2+7=9 is false. However, there are other bases in which 2+7=9 is true. For example, in base 2, 2+7=11, which is equal to 9 in base 10."

And the third draft, probably the least incorrect one (where at least 2 + 7 = 9 is correct while simultaneously not being "a valid mathematical equation"):

"As an MIT mathematician, I can tell you that 2+7=9 is not a valid mathematical equation. In base 10, the sum of 2 and 7 is 9. However, there are other bases in which 2+7 would not equal 9. For example, in base 2, 2+7=11. In base 3, 2+7=10. And so on. So, while it is true that 2+7=9 in base 10, it is not true in all bases."

Well alrighty then. Reminds me of those Cylons mumbling nonsense in the Battlestar Galactica reboot.


Unless you're using ChatGPT with plugins, it doesn't offload anything (and is also bad at math).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: