> I wouldn't be surprised if some Wikipedia editors balk at their volunteer work...

lambdaone · on May 22, 2024

Given that all Wikipedia editors have explicitly consented to their content being released under the Creative Commons Attribution-ShareAlike 4.0 License, they don't get a choice about their content being used for any purpose.

Redistribution of content is an entirely different matter, and the legal status of copyrighted material in relation to LLM training is an open issue that is currently the subject of litigation.

emw · on May 22, 2024

Wikimedia Foundation’s perspective on this [1]:

> "it is important to note that Creative Commons licenses allow for free reproduction and reuse, so AI programs like ChatGPT might copy text from a Wikipedia article or an image from Wikimedia Commons. However, it is not clear yet whether massively copying content from these sources may result in a violation of the Creative Commons license if attribution is not granted. Overall, it is more likely than not if current precedent holds that training systems on copyrighted data will be covered by fair use in the United States, but there is significant uncertainty at time of writing."

The new Wikimedia Enterprise APIs facilitate attribution. For example, the "api.enterprise.wikimedia.com/v2/structured-contents/{name}" response [2] includes an "editor" object in a "version" object. So the Wikipedia editor who most recently edited the article seems quite feasible to attribute. ML apps could incorporate such attribution in their offering, and help satisfy the "BY" clause in the underlying CC-BY-SA 4.0 license for Wikipedia content.

---

1. https://meta.wikimedia.org/wiki/Wikilegal/Copyright_Analysis...

2. https://enterprise.wikimedia.com/docs/on-demand/#article-str...

ZunarJ5 · on May 22, 2024

As another editor, I think they might be a vocal minority. :)