I want attribution if I inspire a thought in AI. I'm surprised the nerve the iss...

martingalex2 · on July 18, 2023

It isn’t clear to me (other than it’s an open engineering problem) why LLs couldn’t also include attribution as part of training. Also tracking attribution could lead to some insights on how its internal representations in vector space are created.

detourdog · on July 18, 2023

That is true I see no reason obvious reason why the LL companies take pride in not being able to document ideation process. I have no justification but I feel it is deceitful not technical reasoning.

TeMPOraL · on July 18, 2023

The issue here is that memorization of any distinguishable part of IP is an incidental aspect - those models aren't memorizing stuff, they're learning it. We don't expect people to keep track of the source of every single piece of information they encounter. It would arguably make learning impossible - as much for humans as for LLMs.

As an intuition pump, when I write "2+2 = " and you mentally complete it with "4", should I chastise you for not completing it with "4, as per ${your elementary class math textbook} and ${that other book you read as a kid}, corroborated by ${your first math teacher} and ${your parent} quoting ${some other work}"?

martingalex2 · on July 18, 2023

What is the hard technical barrier that makes the tracking of attribution for input sequences for LLM training impossible? I don't see any.

TeMPOraL · on July 18, 2023

When you make an omelette, what is the technical barrier making it practically impossible to tell which egg contributed how much to any given part of the meal?

It's roughly the same thing.

thunderrabbit · on July 18, 2023

I have really enjoyed using https://phind.com, which includes attribution in its responses.

supriyo-biswas · on July 18, 2023

Phind’s base model, which is GPT3.5/4 doesn’t itself do attribution, it’s made to do that with prompt engineering which provides the most relevant materials on the web based on a word embedding vector search, and then asks it to reference each source in the answer.

TeMPOraL · on July 18, 2023

I mean, this is more-less what a student does when writing a paper, when they're forced to cite their sources. They first come up with an idea based on their own understanding/recollection, then they try to figure out where did they first took that idea from. If they remember a specific source, they'll cite that; if they don't (because there may not be one specific source they learned from), they'll search for some existing work that expresses the idea in question, and cite that.

I.e. in case of both the student and an LLM, correct citation doesn't actually mean the idea originates from the cited work - only that the work contains this idea.

detourdog · on July 18, 2023

Thank you I want to believe responsible development is happening. I just asked an LLM my first question and the interactive processing was great to watch.