Hacker Newsnew | past | comments | ask | show | jobs | submit | TomOwens's commentslogin

Point 2 misses the distinction between AI models and their outputs.

Let's assume for a moment that training AI (or, in other words, creating an AI model) is not fair use. That means that all of the license restrictions must be adhered to.

For the MIT license, the requirement is to include the copyright notice and permission notice "in all copies or substantial portions of the Software". If we're going to argue that the model is a substantial portion of the software, then only the model would need to carry the notices. And we've already settled on accessing over a server doesn't trigger these clauses.

Something like the AGPL is more interesting. Again, if we accept that the model is a derivative work of the content it was trained on, then the AGPL's viral nature would require that the model be released under an appropriate license. However, it still says nothing about the output. In fact, the GPL family licenses don't require the output of software under one of those licenses to be open, so I suspect that would also be true for content.

So far, though, in the US, it seems courts are beginning to recognize AI model training as fair use. Honestly, I'm not surprised, given that it was seen as fair use to build a searchable database of copyright-protected text. The AI model is an even more transformative use, since (from my understanding) you can't reverse engineer the training data out of a model.

But there is still the ethical question of disclosing the training material. Plagiarism still exists, even for content in the public domain. So attributing the complete set of training material would probably fall into this form of ethical question, rather than the legal questions around intellectual property and licensing agreements. How you go about obtaining the training material is also a relevant discussion, since even fair use doesn't allow you to pirate material, and you must still legally obtain it - fair use only allows you to use it once you've obtained it.

There are still questions for output, but those are, in my opinion, less interesting. If you have a searchable copy of your training material, you can do a fuzzy search of that material to return potential cases where the model returned something close to the original content. GitHub already does something similar with GitHub Copilot and finding public code that matches AI responses, but there are still questions there, too. It's more around matches that may not be in the training data or how much duplicated code needs to be attributed. But once you find the original content, working with licensing becomes easier. There are also questions about guardrails and how much is necessary to prevent exact reproduction of copyright protected material that, even if licensed for training, isn't licensed for redistribution.


> The AI model is an even more transformative use, since (from my understanding) you can't reverse engineer the training data out of a model.

You absolutely can; the model is quite capable of reproducing works it was trained on, if not perfectly then at least close enough to infringe copyright. The only thing stopping it from doing so is filters put in place by services to attempt to dodge the question.

> In fact, the GPL family licenses don't require the output of software under one of those licenses to be open, so I suspect that would also be true for content.

It does if the software copies portions of itself into the output, which seems close enough to what LLMs do. The neuron weights are essentially derived from all the training data.

> There are also questions about guardrails and how much is necessary to prevent exact reproduction of copyright protected material that, even if licensed for training, isn't licensed for redistribution.

That's not something you can handle via guardrails. If you read a piece of code, and then produce something substantially similar in expression (not just in algorithm and comparable functional details), you've still created a derivative work. There is no well-defined threshold for "how similar", the fundamental question is whether you derived from the other code or not.

The only way to not violate the license on the training data is to treat all output as potentially derived from all training data.


> You absolutely can; the model is quite capable of reproducing works it was trained on, if not perfectly then at least close enough to infringe copyright. The only thing stopping it from doing so is filters put in place by services to attempt to dodge the question.

The model doesn't reproduce anything. It's a mathematical representation of the training data. Software that uses the model generates the output. The same model can be used across multiple software applications for different purposes. If I were to go to https://huggingface.co/deepseek-ai/DeepSeek-V3.2/tree/main (for example) and download those files, I wouldn't be able to reverse-engineer the training data without building more software.

Compare that to a search database, which needs the full text in an indexable format, directly associated with the document it came from. Although you can encrypt the database, at some point, it needs to have the text mapped to documents, which would make it much easier to reconstruct the complete original documents.

> That's not something you can handle via guardrails. If you read a piece of code, and then produce something substantially similar in expression (not just in algorithm and comparable functional details), you've still created a derivative work. There is no well-defined threshold for "how similar", the fundamental question is whether you derived from the other code or not.

The threshold of originality defines whether something can be protected by copyright. There are plenty of small snippets of code that can't be protected. But there are still questions about these small snippets that were consumed in the context of a larger, protected work, especially when there are only so many ways to express the same concept in a given language. It's definitely easier in written text than code to reason about.


> The model doesn't reproduce anything. It's a mathematical representation of the training data. Software that uses the model generates the output.

By that argument, a compressed copy of the Internet doesn't reproduce the Internet, the decompression software does. That's not a useful semantic distinction; the compressed file is the derived work, not the decompression software.


The premise of this whole post is incorrect. If an organization is building an AI product or offering an AI service, then a SOC 2 report, or at least a SOC 2 Type 2 report, should answer these questions.

"What happens if someone tries to extract training data?" CC6.7 covers data loss and data transfer restrictions. I've typically included controls related to monitoring data transfer, including flagging and highlighting potential breaches. Documented procedures on what happens if data loss or unauthorized data transfer occurs. These can be reviewed, but may be hard for the auditor to test unless they were executed and there's evidence that they were executed as written.

"Can this agent be manipulated into accessing data it shouldn't? How do you test for adversarial attacks?" I'm struggling to understand the difference between these questions. It seems like part of the answer likely overlaps with controls to address CC6.7 and data loss or data transfer restrictions. CC8.1 discusses testing the product or service.

"How do you prevent prompt injection?" This may be a bit specific for a SOC 2 Type 2 report, since it really gets into requirements, architecture, and design decisions rather than controls over the requirements, architecture, and design. That is, you can essentially not require preventing prompt injection and follow all of your controls related to, for example, CC8.1. CC8.1 talks about managing, authorizing, executing, and documenting changes. You can do all of these things well without that requirement in place.

"What guardrails are in place, and have they been validated?" This is the entire SOC 2 Type 2 report. It lists all evaluated criteria, describes the organization's controls, and provides an audit of those controls. It's up to the organization being audited, however, to think about what controls are necessary for their context. The controls that should be in scope of the audit will differ for an AI product or service than for something else. The recipient of the SOC 2 report can review the controls and ask questions.

Part of the burden is on the organization getting the SOC 2 audit report to think about what controls they need. But there's also a burden on the organization reviewing the audit report not just to see that there are no exceptions, but to review the controls described to make sure the controls are in place for the given product or service. And this detailed information about the controls is what makes something like the SOC 2 audit report a whole lot more useful than something like an ISO 27001 certificate, which says that whatever policies and procedures are in place meet the requirements of the standard and doesn't offer details on how those requirements are met.


I'm looking through what I have access to quickly.

I started with the CSA's Cloud Controls Matrix, just because they trace to a bunch of other standards. They have a control - IAM-07 - that is to "de-provision or respectively modify access of movers / leavers or system identify changes in a timely manner in order to effectively adopt and communicate identity and access management policies". This control points to other sources.

One standard that I have access to, CIS Critical Security Controls v8.1, calls for a process that disables or deletes accounts and revoking access "immediately upon termination, rights revocation, or role change of a user". I believe v8.1 is the latest version.

The Trust Services Criteria mapping is to CC5.3 and CC6.3. This defers to defined organizational policies and procedures and doesn't specify any timelines.

ISO/IEC 27001:2022 and ISO/IEC 27002:2022 mapping is A.5.15 and A.5.18. This is identified as a gap in earlier versions of these standards. I don't have ready access to these standards, so I can't tell you if they give any timelines.

The NIST 800-53 rev 5 mappings are to AC-2 (1, 2, 6, 8), AC-3 (8), AC-6 (7), AU-10 (4), AU-16 (1), and CM-7 (1). All of these appear to defer to organizationally-defined timing and frequency for review.

The NIST CSF v2.0 mapping is to GV.RR-04, GV.SC-10, PR.AA-01, and PR.AA-05. The most relevant ones are the PR.AA controls and they both defer to organizational policies for definitions.

As far as I can tell, most standards simply require that a company defines their policies and procedures and then certification or audit against that standard would only ensure that the documented policies are being followed. If you wanted to implement the "immediate", one way to do it would be to document that as your process (optionally by adopting the CIS Critical Security Controls, but you may or may not want to adopt the whole set) and then have it in a SOC 2 Type 2 audit where the auditor would sample people who have left the organization and when their access was revoked.


If the copyrighted code was uploaded to GitHub by the owner, there's no problem with this. When you upload code to GitHub, one of the rights that you grant to GitHub is the right to use your content for "improving the Service over time". See D.4. License Grant to Us in the GitHub Terms of Service. Once it is up there, you also grant other users certain rights, like viewing public repos and forking repos into their own copies. See D.5. License Grant to Other Users. Even with the most restrictive protections in place, using GitHub requires you to give up certain rights.

A question would be if creating and training Copilot is "improving the Service over time". I would suspect that it would be, though.

There are still some open questions around what happens when Copilot suggests code verbatim, but these are mostly for the users of Copilot. Although I would hope that GitHub is thinking about offering information to ensure that users understand the source of code they use, if it may be protected, and what licenses it may be offered under. There are still some interesting legal questions here, but I don't think that the training of Copilot is one of them.

A more interesting question would be what GitHub does if someone uploads someone else's copyright-protected code to GitHub and it is used for training Copilot before it is removed. If you don't own the copyright, you can't grant GitHub the rights needed to use that code for anything, including improving the service.


> A question would be if creating and training Copilot is "improving the Service over time". I would suspect that it would be, though.

Definitely an interesting case to be had, but I'd argue that it does not. They're using their customers' code to create an entirely new product that would not be possible without it, not just improving their ability to host a Git repo. Otherwise, what standard is beyond "improving the service over time?" Can they do anything with the code they host as long as it improves their service? What about sell bootleg copies of it and use the proceeds to upgrade their servers?


However D4 also explicitly says "This license does not grant GitHub the right to sell Your Content". One could argue that because Copilot is a commerical product it is in fact selling (a derivative of) user code, and thus the grant in D4 does not apply.


Most of the comments on that app as well as here are probably wrong. I'd suspect that everyone who had the app "installed without their permission" opted into the Android COVID-19 Exposure Notification program. This was deployed by Google as part of an update to Google Play Services.

When you go to your phone's settings with this update, there's an option to enable COVID-19 Exposure Notifications. When you turn it on, it prompts you for your location and will download your region's app that uses your phone's new capabilities to connect to the appropriate health authorities.

Massachusetts just opted into this program in the last couple of weeks. I'm honestly not sure why they did it so late - this would have been helpful earlier. Apple iPhones also have this capability, including interoperability with Android phones, and iPhone users in Massachusetts are also able to turn on this setting.

Now, if someone can actually prove that they didn't opt into the COVID-19 Exposure Notifications, then I'd be concerned. But my guess is they opted in when it came out, but there was no app for their region, so nothing was downloaded and the feature did nothing. Then, Massachusetts rolled out the app now and lots of people who configured their phones earlier in the pandemic got a new app. They granted permission for it, perhaps months ago.


I don't know what kind of proof you want, but I just looked at my phone settings after reading your comment. The exposure notification option is there and it's off. The region selection is grayed out because of it. Yet I got the app (uninstalled it after I saw this on hacker news).

I did get a notification when it got installed but I thought it was just a push similar to amber alerts. I didn't realize it installed something at the time.

Still, exposure notification was never turned on.


I'm in Boston and it wasn't installed on my phone (exposure notifications have always been off AFAIK). I'm on old iphone 5s, not sure if that makes a difference or maybe just specific areas? According to this, https://thesomervillenewsweekly.blog/2021/04/05/massnotify-a..., different cities were piloting at different times, although it all seems opt in.


The submission is specifically about a google play app being auto pushed, so being on an iphone would certainly protect you from it :)


Also Boston area. I got a notification on my iPhone that I could turn it on


Same here. Never opted in, just checked and that hasn't changed. I hadn't even selected a region, so it shouldn't even know which invasive app to install, but I still got it.


Ditto. 10 minutes before I saw this post I declined the opt-in notification for exposure notifications, yet I still had the app.


I'm a MA resident and this app was on my (Android) phone...until a few minutes ago when I read about it on Hacker News, found it, and deleted it.

I have no memory of ever opting into the program you describe, and it isn't the type of thing I would normally do. It's possible I guess.

In any case, the way they did this is creepy. There was no icon for the app; I had to look in Settings/Apps & Notifications to find it. And neither the official state press releases nor the few local news stories about it mention that the app was installed without notice. They use vague, lawyerly language about how it can be "enabled".


> In any case, the way they did this is creepy. There was no icon for the app; I had to look in Settings/Apps & Notifications to find it. And neither the official state press releases nor the few local news stories about it mention that the app was installed without notice. They use vague, lawyerly language about how it can be "enabled".

This incident and your comment reminded me of a story Bezos mentioned in his interview about the time Amazon deleted 1984 from kindle. The analogy he made makes me wonder how can we compare what happened here to what Amazon did..

“Without any notice or warning just electronically go into everybody’s Kindle, who had downloaded the book and just disappear it…so it would be as if we walked into your bedroom in the middle of the night, found your bookshelf, and just took that book away”

19:48 https://youtu.be/SCpgKvZB_VQ


MA resident as well, what worries me more is that someone thought that this method of installation was a good idea and even more worrying is that they were also able to execute on it. It feels rather shady and nefarious the lack of public announcement on it. Shenanigans like this how you get the populace to trust the local government less, which is the last thing this country needs.


It's actually great it's happened. It showed everybody that the government can install whatever they want on your phone without your consent and knowledge. In this case they decided to leave you the option to uninstall but in the future they might not and spy on you at will. Another reminder you are not the owner of your device.


-In this case they decided to leave you the option to uninstall but in the future they might not and spy on you at will.

Then they'll be just like Google, Fecebook, Amazon, etc, etc.


Which leads to the question: if anyone powerful or wealthy enough can take total control of your phone, how comfortable do you feel with that?

There are two routes here. One way is to deal with it the European way, i.e. to try to fix it by a legal framework. The other one is a technical solution like Purism, which is very far from mainstream still. The sooner people realize they have a problem, the sooner they start organizing to find a solution.


My kids really want an oculus but I absolutely refuse to let Facebook into our house, anymore than I knowingly have to; I’m sure they’ve weaseled in other ways I don’t know about yet.


I don't see anything bad with people not trusting their local government... Exhibit A


Maybe they shouldn't blindly trust it, but they should be able to hold it accountable.


Well, given that a significant percentage of citizenry is anti-vaxxer-level-stupid, there isn't much improvement over people trusting their local Government either…


Considering in a not too distant yesterday(pre-Covid) the "anti-vaxxers" were all liberal/granola types and now they are magically all conservative/racist types, perhaps you may want to re-assess your 2-dimensional view of the real world. I believe a cogent example was on the front-page of HN just a day or two ago, but IANYG.


"Not too distant" means only five years ago. The "anti-vaxxers" were people who lived in primarily white, primarily wealthy, primarily urban or suburban environments and who refused (usually) the MMR vaccine.

You don't find measles outbreaks in rural Mississippi. You find them in Washington, New York, and California. [1]

So it's pretty rich to label someone as an "anti-vaxxer" for refusing the experimental, emergency-use, mRNA jabs, when that person has never demonstrated even the slightest hesitancy about receiving or administering every other approved vaccine.

1. https://en.wikipedia.org/wiki/Measles_resurgence_in_the_Unit...


Labels like anti-vaxxer just seems to be weaponized propaganda to me. It’s a cheap, easy way to discredit someone you don’t agree with. Hopefully as time goes on and disparaging groups throw these accusations back-and-forth of each other, that it eventually dilutes their meaning and impact.


lol. I never mentioned the political leanings of the anti-vaxxer type; just that they are either very stupid people or misinformed by propaganda originating somewhere. it's interesting to see multiple downvotes on my comments from the folks who probably saw what's not written up there, just like you did. the "conservative/racist type" you said?

quick question: what made you put the labels conservative and racist together?

also, a liberal eating granola bar might be stupid, but their actions do not put anyone else in danger. an anti-vaxxer however is a risk to the society in that they are an active and potential host to a disease in circulation.


Wow, I thought I was someone who didn't get the app when I checked the icons but once I went into settings, there it was. I even have a NH phone number but live in MA.


Did you get vaccinated? If so, did you supply your email address related to your Google account on the form or enough other information to link the two? Did you read all of the related documentation? I wouldn't be surprised if they slipped somewhere on the form that you were agreeing to it.


I did supply my email but it's not a Gmail or Google for Work email address nor a domain tied to those. Exposure notification is clearly off. Still got the app.


There's even a standard for mobile operators to control the setting in your modem and update/install apps: https://en.wikipedia.org/wiki/OMA_Device_Management

I reverse engineered what this does in practice on pinephone modem (Quectel EG25G), for example, and there are pre-compiled binaries there for tmobile and vodafone that process their particular OMA DM flavors, download some configuration and code from internet and run it under root on the modem's SoC ARM CPU. (that's still isolated over USB from the main pinephone SoC, but obviously not good) It's also thankfully disabled by default, but if you google for oma dm android, you get reports of this protocol being used still.

Whatever it does on regular Android phone depends on how well it is implemented on android. Regular phones don't have two almost-isolated SoCs like pinephone, so oma dm client would probably run on the main SoC, and all depends on how secure that binary blob is or what it does/allows the operator to do.

Quectel software is a bit of a turd, so I woudln't take from this that operators can run random code they make the device download under root user, using this protocol. Most proprietary software like this is pretty shit, so I wouldn't feel warm and fuzzy safe on random Android device either.


Can one use pinephones to collect these blobs, and then try to run them on Android simulator or whatever for more specific knowledge about operators' practices?


It's quite modem specific. You'll get more information just decompiling them.


I was about to say it might be through the carriers. I put a Verizon sim in my phone and I got a bunch of BS apps installed on my phone a few days later.


I just went through the Exposure Notifications flow on Android, and selected a region where it's not currently available (Arkansas). It displayed a message saying it wasn't supported in my region, and left the setting disabled. While it's still possible that your theory is correct, I certainly don't think it's the intended flow as of now.


I have no memory of opting in, I checked under Settings -> Google and "COVID-19 Exposure Notifications" was set to "Off", and the MassNotify app was still installed on my phone. It has no icon and the only way to find it is going to Settings -> Apps & notifications -> See all apps and it comes up under "Massachusetts Department of Public Health". Then when you go to the Google Play Store and search "MassNotify" or "mass notify" or even "Massachusetts Department of Public Health" (the exact name of the app), it doesn't come up in the search results. You have to go to "Manage apps & device" on the Google Play Store then scroll down to "MassNotify" which doesn't even match the name of the app in the other settings menu. This is pretty shady.


I just found this app and removed it. And I definitely did not opt into any kind of covid tracking earlier.

This app seems to use Bluetooth to track potential violations of 6ft personal space and notify people if someone from that list later gets a covid positive test. Whatever the noble goal is I do not want it on my phone, this is creepy!


When you opt-in, does it notify you of all the permissions the app will require?

- view network connections

- pair with Bluetooth devices

- full network access

- run at startup

- prevent device from sleeping


Virtually every non-trivial Android application has these permissions, none of which are even important enough for the system to prompt you for permission. The only interesting one is "pair with Bluetooth devices" which is how the Exposure Notifications system works.


Users expect to see the requested permissions.


All these permissions are granted without ever being shown to the user, due to being in the "other" category. If you install this app normally, Android will never ask you for permission, but just silently grant these permissions.


> The permission modal says this [0].

[0] https://news.ycombinator.com/item?id=27558825


On Android 6.0 (2017) and later, there is no permission modal if all permissions are in the "other" category, as they are in this case.

Android 6.0 introduced requestable permissions, were critical permissions had to be requested (and could be denied) at runtime.

At the same time it removed all modals for non-critical permissions.


"full network access" is a hugely important permission.

My cynical side believes that the reason for it not being as visible as other permissions is that platforms profit from the ad-driven app model, which itself heavily relies on an apps ability to access the internet.

That could also be why stock roms do not allow users to disable full network access on a per app basis. (...like, for example, the camera permission.)


It's actually not disableable because there are so many ways to bypass it.

For example, just trick a user into clicking a hyperlink to another app like a browser which does have full internet access, and you have successfully exfiltrated any data in the URL.


Seems like a weak excuse.

I mean sure, you could do that, but it would be complicated, conspicuous, tiring for the user and you would still only get one-sided occasional transfer. It could exfiltrate data, albeit suspiciously, but it wouldn't work for ads .. which are the likely motivating factor.

Other motivating factor may be tracking, which google and vendors want to do, but I'm not sure what the stance would be on others tracking their users.


Yeah, this also seems like the most logical reason to me. If your business depends on people seeing ads in apps, why give them the possibility to circumvent them?


I have no memory of opting in to this, but it was installed on my phone.

Updated to add: well I'll be, an hour after this comment and seeing the link show me that Mass Notification was installed, I was prompted to opt-in appropos of nothing.


If it makes you feel better (or worse) I specifically opted out and this app is installed


Another MA resident here. Never opted in and it still shows I'm not. The app was silently installed on my Android. There's no icon so I thought it didn't install at first, until I looked at my app list in settings.

I'm curious to know if there's any MA Android users that previously removed Google Play, and if they still have the app or not. My guess is no?


You cant remove google play in andtoid versions beyond 6 i believe.

You can only disable it


You can also flash a custom ROM and just not install gapps.


Sorry, I was referring to custom ROMs.


This speculation is 100% wrong. I checked for this app after seeing this and had it listed under updates available (it was installed already)

So I decided to check if I was in fact opted in and I was not opted in. Everything was off and this app was still installed without my consent. I do have automatic UPDATES turned on, but that shouldn't tell Google to just push whatever they want to me. You should probably edit your post saying your speculation is wrong.

I don't know what kind of proof you want, but I 100% never opted in.


lol, just got installed on my tablet. Wasn't there earlier.


This is a great explanation for whats occurring. I'll be interested to see what comes of all of this.

So far what I guess is:

- This is likely a government action via telco and not something done via Google* (*Unless they've opted into a program like the one you stated)

- These phones being affected COULD BE all Carrier Locked phones which have specific terms to allow such behavior.

To me, this is pretty clear cut violation of Google's Device update policy and could be considered Malware or stalkerware (by their definition): https://support.google.com/googleplay/android-developer/answ...

https://support.google.com/googleplay/android-developer/answ...

-----

I think we should all slow down on putting Google for full blame here and focus on Government abuse and overstep of powers.


"These phones being affected COULD BE all Carrier Locked phones which have specific terms to allow such behavior." I use a unlocked Pixel 4a on Google Fi and still got the app.


I can only speak for myself, but I checked my settings and the COVID-19 Exposure Notifications setting is set to "Off" and I still had this app pushed silently to my phone. What's even worse is there's no app icon for it on the device and it doesn't show up under your app list. I only knew it was on my device at all because I have auto updates turned off and it was in the queue waiting to be updated in the Play Store.


I never opted in, the setting for COVID notifications has always been OFF, and I still got the app silently installed on my Android phone.


I wasn't opted in. I have recently moved to Massachusetts, the app was probably installed during the last system update. I remember seeing a prompt after rebooting my phone to finish the update (this week, Pixel 3a) to enable contact tracing. I said no, but obviously the app had already been installed automatically, and apparently stayed.


To clarify: It's in your Google Account settings, not a separately broken-out setting that you see when you first bring up your phone settings, or at least it's that way on my phone.


You can be concerned by reading the top comment on this HN thread.


I think this is why the PDF is actually the "Guide to the Software Engineering Body of Knowledge". It's not a representation of the complete body of knowledge itself, but extracting key concepts and terms and provides pointers to things that are most relevant. If things are irrelevant or disproven over time, the guide to the body of knowledge would remove those terms, concepts, or references and point to something else.


It actually apes the Project Management Institute's terminology exactly. The PMBOK guide and the PMBOK are the same thing (as far as I know). This takes exactly the same approach. The guide is the entire book, it just has a funny name as if there was another larger book that this is a guide to. There isn't.

https://www.pmi.org/pmbok-guide-standards/foundational/pmbok


Unfortunately, where it does present "key concepts and terms", that presentation is often flawed. I think it would be more useful as a guide to the field of software engineering if that's the only thing it tried to be, setting out a well-considered structure but then referring to other reliable sources for details. It could be a lot shorter and more accessible, and it wouldn't keep saying things that are misleading or incorrect.


I'm thinking of a distinction between eg body of knowledge to mean the arrangement of muscle, skeleton and organs, versus a corpus of knowledge that you presume to be a representative collection of body parts making up a adequate whole.


I wonder if the people involved in approving and conducting this research are aware of the ACM's Code of Ethics. I can see pretty clear links to at least two or three of the code's ethical principles. This seems to be a pretty serious breakdown of the researchers understanding their ethical responsibilities, but also the review and approval of research projects.


This just bit me.

The first thing that I noticed was that some people are not understanding the GPL. It's far more impactful to Rails than the vast majority of web applications built using Rails. The use of GPL'd files means that the gem itself has to be released under the GPL. Since the gem is now under the GPL, dependencies are also under the GPL. That would include Rails. However, even if Rails was under the GPL, organizations could still build closed-source web applications using Rails since network access is not distribution. That's the whole point of the AGPL.

However, it does raise a lot of questions about when someone is allowed to yank a gem (or any library, really). It's been a while since I took a deep dive, but I was under the general impression that there was some leeway around not breaking the world when rectifying license issues. I would think that releasing new versions under the correct license and giving everyone notice and time (30 days?) to update would be fine for most copyright holders. I'd suspect that most open source developers wouldn't want to break the world. The sudden yanking with no warning caused builds to fail everywhere.

The absolute worst thing, though, was that changing a license should not be a minor (or a major) version number increase. It should be a patch. The breaking was simply because Rails is pinned to 0.3.x, but the first release under the new license was 0.4.x. Fortunately, the author released a 0.3.6 patch with the correct license, so it's just a matter of a bundle update to get the latest version. But if he hadn't, Rails would have had to release a new version and anyone on legacy/unsupported Rails versions would be hosed if they had to rebuild and redeploy.

This is a really good reason to stand up your own artifact repository and put all of your third-party dependencies in it, especially if you're a business.


> The absolute worst thing, though, was that changing a license should not be a minor (or a major) version number increase.

The license didn't change. It was always already GPL, due to the usage of GPL-licensed code, regardless of what the metadata said. The change just made the metadata correctly reflect reality.

[EDIT: I should clarify that technically mimemagic wasn't already GPL, but the only legal way to use it was by satisfying your obligations under the GPL, making it effectively GPL. The author did relicense his own code to be GPL instead of MIT.]

To me it seems like making your downstreams aware of that ASAP is pretty important, since this has important legal implications for them as well. Yanking the old versions and releasing an update with an incompatible version number is a way to do that, albeit one that's quite disruptive.


Yeah. That's a better way of putting it. The author didn't opt to change the license. He corrected a licensing error.

I do agree that making the downstream users aware is important, I just don't agree that immediately yanking is the right solution. Putting out a new version would have been nice. Adding a post-install message to the new versions would have been good to start to get the word out. Not sure how far to take it, but opening issues with dependencies (RubyGems provides this information) would have also been nice, giving the major dependencies a good notice before yanking.


After the "left-pad" fiasco, and a similar event on the Ruby side, I started vendoring my dependencies as standard practice. I have not been sorry yet, in fact I feel vindicated in that approach.



Vendoring in ruby land is a double edged sword. It is much safer as you said. However if you _do_ vendor, be sure to be running containerized first. Otherwise you will be in a very frustrating spot of having to handle all sorts of native gem issues when trying to run on various computers during dev/test/prod.


Yes this is a real problem. We primarily use docker which solves the issue, but there are people that hate docker and want to run native. For the mac users that doesn't go too well.


Ive lost countless work days to figuring out gem build issues on mac when everyone else on the team was running on linux/vagrant.


Vendoring is a good first step, too. As long as you have a local copy of all the dependencies, you're better off than needing to go pull them from the Internet every time you want them and risk having them gone. Potentially worse is having the same version but with modifications.


we get a form of this with our two-stage image building process -- the first stage installs all dependencies and we only update it when dependencies change


> The use of GPL'd files means that the gem itself has to be released under the GPL. Since the gem is now under the GPL, dependencies are also under the GPL.

No, that's not true. You can dual-license dependent software under GPL and MIT. The GPL merely requires a license at least as permissive as it.


> The GPL merely requires a license at least as permissive as it.

No, it requires a license that's at least as permissive as it AND that imposes the same obligations (i.e. source distribution, etc.) on the licensee.

Dual-licensing dependent software under the GPL and MIT only ensures that you can rip out the GPL dependency, and then use the (formerly) dependent software under MIT. The whole package is still GPL and imposes the same obligations on derivatives of the package.


Yes, that's what I'm saying.


You can dual-license if you own the full copyright ownership but if you include GPLed stuff (and don't have the full copyright ownership) you'll have to GPL the result.

As for "at least as permissive" - it requires no further restrictions, but it adds a bunch of restrictions itself. And there's no other license that doesn't add restrictions - MIT adds restrictions to reproduce the MIT license, which is an extra restriction. The restrictions are attempted excused by the FSF under the "attribution" clause of the GPL, but it is not clear to me that is valid and it has not tested by any court.


I am fairly sure MIT's license is considered an "appropriate legal notice."


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: