Actually, leaking customer information does not really matter to the company. What the company cares is the leak itself is not their fault. They just want someone to take those responsibility, so putting in 'Someone Elses Computer' is actually a good strategy.
Only when losing customer information equals to revenue drops, company will take security more seriously. Enforcing a law to company storing customer to have common security practice is a possible solution, though it hurts low budget startup.
Every[1] computer is, at worst, a misconfigured firewall and a late security patch - or, at best, one moron clicking a phishing e-mail away from being 'Some Martian script kiddie's computer'.
Keeping your data on cloud introduces new attack vendors. It closes others.
If you're going to be an idiot[2][3], and host a production database which doesn't require any authentication on the cloud, I don't have much hope for your non-cloud-deployed security.
[1] For most reasonable definitions of every.
[2] I have had the dubious pleasure of working in a ~15 year-old software company where ~half of the machines were virus-infested (Mostly Conficker, but also some other shit that I forget. This was in 2012. If I recall correctly, we had one part-time IT guy for ~30 engineers. His job was ordering computer parts, and keeping e-mail working. The health of the dev machines was not his problem.) The SOP was to do all your work in a VM running Windows XP, and wipe it every few weeks, or whenever performance would grind to a halt - whichever came first. One of my tasks, a few months in, was to deal with the virus situation on the build server, so that we could 'securely' build the release, and sign it with the encryption key that only the VPs had access to.
[3] I have also had a chance to work in a company obsessed with security (Because of the nature of their products). One of my discoveries, a few months before I left, was that one of their products' updates were pushed via an insecure HTTP downloader. While I was there, nobody budgeted time to get it fixed.
The difference with computers in the "cloud" is that there is a much higher incentive to hack them since there is much more data on them, and hackers can be fairly confident that they will get valuable data in a hack. This is not the case with your personal computer. So if you are bad at security, I would argue that it is much safer (from a security point of view, at least) to store things on your own computer, since your mistakes are much less likely to be noticed.
Even if they were Amazon and actually owned the hardware being run on it wouldn't have changed the outcome at all so I don't understand how it being someone else's computer is relevant.
Right. In this case, the OCR software dev had control of the clound machines.
However, the customers that uploaded sensitive documents to this cloud OCR service did not have control of the computers, the code, or the configuration.
Yes, if you don't trust anyone you can't get anything done but this feels like the kind of task where you should be a little bit nervous each time you do it.
In my experience you have control and insight into machines in AWS, what insight or control do you think was lacking here? It seems to me more they didn't understand the technology they were building on.
As far as I am concerned, see p1necone's comment above: It's not about the relation between Abbyy and Amazon, it's about the relation between them and their customers who stored their documents with some company where they had not insight into how they were handling their data.
Another MongoDB misconfiguration, wow. How is it still so easy to configure mongo with no creds and an open port? Feels like these alone cause a large % of data leaks.
If I was running a cloud platform of such a massive scale I'd probably scan my own ports to identify glaring problems like this one. Kind of surprising that isn't happening considering how bad it is to have the brand associated with a report like this.
If you spin up a server and install mongodb (while forgetting to turn on a firewall that blocks all non-port 443/80 traffic) - everyone has root access to your database.
Easy mistake to make. I've probably done it at least once on publicly accessible test instances.
By default, MongoDB doesn't listen on the public interface so it won't be exposed to the Internet - it only listens to localhost. Old versions of MongoDB had bad defaults but that hasn't been the case in years:
The defaults on Ubuntu at least have been changed (not sure since when, though) "since release 2.6.0 we have made localhost binding the default configuration in our most popular deployment package formats, RPM and deb" from https://www.mongodb.com/blog/post/update-how-to-avoid-a-mali...
You should really be configuring your AWS security groups for the proper inbound/outbound ports. So a failure at that basic level of even opening up access to the machine so fully. You actually have to whitelist everything to be open like that.
Yet AWS still default to wide-open security groups for their new Cloud9/CodePipeline instances that my devs create, and Trusted Adviser tells me about the insecure configuration...
Defaults matter but I think there’s also a lot of blame for developer culture, especially in the circles where Mongo is popular. Faster, faster, ship it…
I think document stores are prone to an early threshold of thinking you’ve gotten a lot done without having to “waste” defining models/types and some people never shake that feeling even after being hip deep in all of the code they’re now writing to migrate, validate, or analyze that data.
Our ORM allows this on a relational database with similar ‘quick dev start’ benefits. But adding all those pesky structure, validation and indexing are much easier to add later. Also, when there are perf issues with Psql and Mysql it is usually a few minutes to find and fix the issue, with Mongo, even if you find it, you might be at a loss to fix it.
Allowing unauthenticated access is the default configuration, but I think you have to go out of your way to make it accessible from external systems, let alone by anyone on the open internet...
There is a severe lack of good LOCAL ocr options for documents. They only ones I know are Abbyy (very good and very expensive) and OCR.space Local (affordable but not as good) and of course Tesseract. But i feel Tesseract is increasingly left behind with regards to OCR quality and speed.
Aside; This was the OCR engine included with DEVON(think/note) applications.
Thankfully the OCR engine processed documents offline, in a seemingly prescient move, adherence to the rule of processing and storing as little data as needed saved an anxiety inducing set of events from occurring.
Shame that small teams (I think that team is less than 10 devs) are sometimes the only businesses with enough sense to ensure their risk when it comes to things like this is mitigated or reduced.
This is why the bank where I work developed its own OCR software (and translation and entity recognition, etc.). It simply couldn't afford to have these kind of leaks.
If I have the right 'Abbyy', I'm surprised that I don't see any information about this data release on their website. Or, maybe not. Are there disclosure laws that one would expect to come into play here?
The bigger mistake was made before this data breach even happened.