You'd do it at login time. User enters user/pwd -> hash with unsalted sha-1, check if in list -> if yes, alert to change / if no, proceed with normal hashing.
Easy, just convert all the hashes into passwords using a rainbow table. Should only take a few seconds to convert all 6.5M passwords -- O(n) operation here. Then run all the passwords through each user's password algorithm, this is a O(n^2) operation. Essentially you're making 6.5M password attempts for each of your users. It could be slightly faster because I'm sure there are quite a few duplicates in 6.5M passwords.
What's wrong? They exist... they're bigger than md5 tables, but not significantly larger. If you don't have 50GB of free disk space, you could get a table with lower complexity for around 20GB or so.
A cross-reference is only feasible in very bad situations:
- no-salt or same-salt and same hashing
- trivial/common passwords (password1 etc)
- password(hashed/unhashed) and email are paired.
A cross-reference could be accomplished for all known cracked linkedin passwords, but this would be no different then you running a dictionary attack of known passwords against your own users... This seems very bad. Enforcing strong but sane password strength rules should mitigate this need.
Cross reference only has value if both the hash and email pairs are leaked.
The bitcoin leak fell into one of these very bad situations:
- [<email>, <hash>] where leaked together
- poor hashing (just sha1, no salt if memory serves)
- unfortunate number of people reuse passwords
The released passwords are hashed with SHA1. Assuming you use the same algorithm and linkedin does not use a salt (they probably do), then you could just compare the hashes.
LinkedIn passwords are not salted. You can only make comparisons if your database contains unsalted passwords. And if both databases used salted-passwords, then you still can't compare unless you all shared the same salting key.
You can't compare the hashes unless you have access to the clear passwords of your users. Unless you mean to do the comparison just as they log in. Seems like a lot of hassle for not much though.
you'd compare the hashes in your database with those from the file. The users with a hash contained in the file would be notified.
Because the passwords aren't salted(stupid), you might get multiple hits for the same hash(for example, for the good old "1234" password), meaning you might end up contacting more users than actually affected. Better safe than sorry.