1. Training AI on copyrighted works is fair use, so it's allowed no matter what the license says.
2. Training AI on copyrighted works is not fair use, so since pretty much every open source license requires attribution (even ones as lax as MIT do; it's only ones that are pretty much PD-equivalent like CC0, WTFPL, and Unlicense that don't) and AI doesn't give attribution, it's already disallowed by all of them.
So in either case, having a license mention AI explicitly wouldn't do any good, and would only make the license fail to comply with the OSD.
Point 2 misses the distinction between AI models and their outputs.
Let's assume for a moment that training AI (or, in other words, creating an AI model) is not fair use. That means that all of the license restrictions must be adhered to.
For the MIT license, the requirement is to include the copyright notice and permission notice "in all copies or substantial portions of the Software". If we're going to argue that the model is a substantial portion of the software, then only the model would need to carry the notices. And we've already settled on accessing over a server doesn't trigger these clauses.
Something like the AGPL is more interesting. Again, if we accept that the model is a derivative work of the content it was trained on, then the AGPL's viral nature would require that the model be released under an appropriate license. However, it still says nothing about the output. In fact, the GPL family licenses don't require the output of software under one of those licenses to be open, so I suspect that would also be true for content.
So far, though, in the US, it seems courts are beginning to recognize AI model training as fair use. Honestly, I'm not surprised, given that it was seen as fair use to build a searchable database of copyright-protected text. The AI model is an even more transformative use, since (from my understanding) you can't reverse engineer the training data out of a model.
But there is still the ethical question of disclosing the training material. Plagiarism still exists, even for content in the public domain. So attributing the complete set of training material would probably fall into this form of ethical question, rather than the legal questions around intellectual property and licensing agreements. How you go about obtaining the training material is also a relevant discussion, since even fair use doesn't allow you to pirate material, and you must still legally obtain it - fair use only allows you to use it once you've obtained it.
There are still questions for output, but those are, in my opinion, less interesting. If you have a searchable copy of your training material, you can do a fuzzy search of that material to return potential cases where the model returned something close to the original content. GitHub already does something similar with GitHub Copilot and finding public code that matches AI responses, but there are still questions there, too. It's more around matches that may not be in the training data or how much duplicated code needs to be attributed. But once you find the original content, working with licensing becomes easier. There are also questions about guardrails and how much is necessary to prevent exact reproduction of copyright protected material that, even if licensed for training, isn't licensed for redistribution.
> The AI model is an even more transformative use, since (from my understanding) you can't reverse engineer the training data out of a model.
You absolutely can; the model is quite capable of reproducing works it was trained on, if not perfectly then at least close enough to infringe copyright. The only thing stopping it from doing so is filters put in place by services to attempt to dodge the question.
> In fact, the GPL family licenses don't require the output of software under one of those licenses to be open, so I suspect that would also be true for content.
It does if the software copies portions of itself into the output, which seems close enough to what LLMs do. The neuron weights are essentially derived from all the training data.
> There are also questions about guardrails and how much is necessary to prevent exact reproduction of copyright protected material that, even if licensed for training, isn't licensed for redistribution.
That's not something you can handle via guardrails. If you read a piece of code, and then produce something substantially similar in expression (not just in algorithm and comparable functional details), you've still created a derivative work. There is no well-defined threshold for "how similar", the fundamental question is whether you derived from the other code or not.
The only way to not violate the license on the training data is to treat all output as potentially derived from all training data.
> FWIW, people here illegally are already not eligible for Medicaid, [0] so it's hard to see why ICE having access to a roster of Medicaid enrollees would help them with their stated mission of enforcing removal orders.
Presumably, it's because a lot of them are getting Medicaid despite not being eligible to. Isn't the point of every audit, investigation, etc. to find things that aren't being done correctly?
> Presumably, it's because a lot of them are getting Medicaid despite not being eligible to
Why are you presuming this? There is no evidence this is happening in any widespread fashion.
> Isn't the point of every audit, investigation, etc. to find things that aren't being done correctly?
If it is being honest about it's intention, yes. I think we have seen an absolute mountain of evidence that this administration does "audits" as massive data collection waves to suit any and every purpose they want, though.
If this was about fixing things being done incorrectly, DHHS should be doing the audit, not DHS. Perhaps the latter doesn't understand the difference between the two, though, not noticing they're missing an H in their abbreviation.
No evidence because there has been no investigation. The massive Somali fraud had no evidence until a random YouTuber started knocking on quality learing center doors, now lots of new evidence has been found.
if there are massive frauds, DOGE should've revealed that. The fact that people keep spewing no investigation while there should be several times shows how ignorant people is.
With one extra caveat. From `man 7 tcp`: "As currently implemented, there is a 200 millisecond ceiling on the time for which output is corked by TCP_CORK. If this ceiling is reached, then queued data is automatically transmitted."
This is the only way I could come up with that would allow an end user to do a full factory reset, and end up back in a known good secure state afterwards.
Storing it in the firmware would mean every user has the same key. Storing it in eeprom means a factory reset will clear it. This allows me to ship hardware with the default key on a sticker on the side, and let's a non technical user reset it back to that if they need to.
If you break the government's rules, that should be between you and the government. I shouldn't have to front the cost of any fines or otherwise be in the middle of it.
You've inadvertently completed both parts of a proof by cases. We don't want speeding laws enforced at all right now, because most speed limits are way too low, because they're set for reasons other than actual traffic safety. Let's raise all speed limits to the 85th percentile speed first and only then talk about stepping up enforcement.
Let's not. The Xth percentile speed is not an appropriate measure for a few reasons:
1. Humans are not generally capable of sufficiently accurate long-term low-incidence risk assessment. Meaning, you irrationally value potentially getting to work 10 seconds faster over a 50% increased chance you run over a child crossing the street.
2. Humans are subject to too many irrational psychological factors; stuff like:
• False sense of security due to sitting in a box isolated from the outside world, that's advertised to keep them "safe" in case of a collision.
• Herd mentality, e.g. "everyone's going over the limit, so I will too". Bonus points for rationalizing this behavior "because it's safer to go at the speed of traffic!".
• Delusional rationalizations like "if the limit is 50 then going 10 over must be fine too, due to <reasons>!". Bonus points for applying the "5/10/15/20 over" rule for every possible speed limit — basic maths and physics say hello!
3. The speed humans will travel at on a given road depends primarily on what speed that road seems designed for. People will drive faster on straight, wide roads and slower on winding, narrow ones, regardless of the speed limit. Changing speed limits has little effect compared to changing the physical infrastructure. Show me a picture of a road and I'll tell you how fast people will drive on it.
As such, it makes no sense to first make some sort of a road and only then figure out the limits by observing real traffic. Figure out the appropriate limit first, then design the road with it in mind.
> Bonus points for rationalizing this behavior "because it's safer to go at the speed of traffic!".
But that's true (look up the Solomon curve), and it's exactly why the 85th percentile would be better.
> Delusional rationalizations like "if the limit is 50 then going 10 over must be fine too, due to <reasons>!". Bonus points for applying the "5/10/15/20 over" rule for every possible speed limit — basic maths and physics say hello!
You have cause and effect backwards. People think it's safe to go over the speed limit precisely because most speed limits are too low.
> Changing speed limits has little effect compared to changing the physical infrastructure. Show me a picture of a road and I'll tell you how fast people will drive on it.
Right. So even if going slower is safer, just making the speed limit lower won't accomplish that.
I'll agree with you regarding major arterials but disagree when it comes to suburban neighborhoods. What feels safe from the perspective of someone operating a vehicle can be quite different than what's actually safe when there are pedestrians and cars unexpectedly popping out of driveways.
> What feels safe from the perspective of someone operating a vehicle can be quite different than what's actually safe when there are pedestrians and cars unexpectedly popping out of driveways.
That's all the more reason to raise speed limits on the major roads. Speed limits being more reasonable there makes it more likely that drivers would abide by them even on those smaller residential streets.
And besides, as other commenters pointed out, even if things get lost in the mail or the government otherwise drops the ball, they'll still consider that your fault.
Really? I haven't had any problems, even with computers that don't meet the official hardware requirements.
Download the Win11 Pro ISO, extract it to a USB drive and then execute the command below from it for a totally automated install that bypasses all the BS.
.\setup.exe /product server /auto upgrade /EULA accept /migratedrivers all /ShowOOBE none /Compat IgnoreWarning /Telemetry Disable
You're welcome!
PS: I know it says "server" but when upgrading a desktop machine, desktop is what you will get --- minus a lot of BS.
I believe you that that way works today, but once knowledge of it starts to spread, I expect Microsoft to break it, just like they previously broke Shift+F10 "oobe\bypassnro" and "start ms-cxh:localonly".
Thats exactly my point, they will keep closing loopholes but they will never truly stop people doing it without removing local accounts completely, which they cant do.
It has worked all along and MS can't break it because I have the ISO that it works with.
It's unlikely it can be broken without totally abandoning the server market and disrupting a lot of existing installations --- which would be a marketing disaster.
1. Training AI on copyrighted works is fair use, so it's allowed no matter what the license says.
2. Training AI on copyrighted works is not fair use, so since pretty much every open source license requires attribution (even ones as lax as MIT do; it's only ones that are pretty much PD-equivalent like CC0, WTFPL, and Unlicense that don't) and AI doesn't give attribution, it's already disallowed by all of them.
So in either case, having a license mention AI explicitly wouldn't do any good, and would only make the license fail to comply with the OSD.
reply