I've just downloaded the database linked and it only contains the hashed passwor...

madsr · on June 6, 2012

Keep in mind that whoever leaked the hashes is probably keeping the usernames / emails for themselves. The forum in question doesn't allow posting of user-identifiable information according to the forum guidelines.

The leaked hashes seems to be SHA-1. I've also confirmed that the hash of my own (semi-complex) LinkedIn password is in the list. Accidentally this is the same password as I had for HN and that I've now changed (phew! THAT'd been bad! :-)

mjschultz · on June 6, 2012

Doesn't this imply that LinkedIn doesn't salt the password prior to storing it. So then a good chunk of those passwords will be in a rainbow table.

madsr · on June 6, 2012

Yes. The hash I calculated was without a salt (the same way you generate a hash on sites like http://darrenfauth.com/generators/sha1)

joesmoe4297 · on June 6, 2012

You can get reflected XSS in that field. Paste "<script>alert('XSS')</script>" in the "Value to sha1" input box.

Darren, you should check out output encoding.

16s · on June 6, 2012

With these sorts of simple hashes, you don't need rainbow tables when you have a few GPUs and OCLHashcat.

veemjeem · on June 6, 2012

It would still take a moderate amount of time for a single password if it's long and complex -- you're essentially generating the rainbow table. You might as well just download a sha1 rainbow table and just perform a O(1) lookup. You could reverse all the 6.5M password hashes in mere seconds.

mrb · on June 6, 2012

Actually, for a large enough list of unsalted password hashes, bruteforcing is faster that rainbow tables:

- a rainbow table may require a constant amount of time to reverse 1 hash, but it has to be repeated N times for N passwords.

- when bruteforcing, a password candidate can be checked against N hashes in a constant amount of time (look up the candidate hash in a hash table)

For example if it takes 10 minutes to look up a hash in a very large rainbow table (such as the A5/1 GSM tables published a few years ago), it would take 123 years to attempt to reverse these 6.5M hashes. On the other hand, millions of the leaked SHA1 hashes can be cracked in mere hours on a GPU with oclhashcat which tests billions of candidate hashes per second.

catch23 · on June 7, 2012

true, for extremely large rainbow tables. SHA1 tables are around 20-60GB depending on how large your base character set is. If you shoved all this data into a giant database, query speed is still under a few milliseconds. In general, rainbow tables can be sharded fairly easily, so if your data set is a few hundred terabytes, just split it across a few machines and you'll retain the millisecond query times. Storing and querying easily partitioned data will usually be faster than a brute force calculation.

Calculating it is like saying you want to find the fibonacci number for any given N, and you have a really fast processor to calculate it to that N, but if you just persisted pre-calculated values up to C, you'd only need to calculate N-C hashes. So even if you are bruteforcing the password, it is still faster to have rainbow tables up to a certain length.

mrb · on June 9, 2012

What I say is true for any size of rainbow table. It seems you forget that RT lookups require CPU resources in addition to mere I/O resources. There is always a number of hashes beyond which brute forcing them is faster than RTs. Sometimes this number is very high (billions of hashes), sometimes it is lower (thousands of hashes). It depends on many factors: RT chain length, speed of the H() and R() functions, speed of the brute forcing implementation, etc.

To take your example of a small SHA1 rainbow table of 20GB, assuming it has a chain length of 40k, looking up a hash in it will require on average 200M calls to the SHA1 compression function (assuming a successful lookup). A modern CPU core can do about 5M calls per second. Therefore looking up one hash will take at least 40 sec, and looking up these 6.5M LinkedIn hashes would take 8.2 years! (This is just counting CPU time, I assume the RT is loaded in RAM for a negligible I/O access time to its data.) A RT of this size would cover a password space of about 2^44. For comparison a decent GPU can brute force this many hashes concurrently at a speed of roughly 500M per second (see oclhashcat perf numbers on an HD 7970). Covering the same password space would take only 9.8 hours. Compare 8.2 years vs. 9.8 hours: obviously the LinkedIn hashes that have been cracked so far have been brute forced, not looked up in RTs!

And even if you leveraged GPUs to perform RT lookups, they would speed up the computations by roughly a factor 100x, reducing the 8.2 years down to 30 days, still unable to match the short 9.8-hour brute forcing session. (My friend Bitweasil is doing research on GPU-accelerated rainbow tables, see cryptohaze.com)

philip1209 · on June 6, 2012

As a more general question: why is it not an industry standard to salt with the username/email in addition to the random key? (i.e. Sha1($salt + $email + $password)). Even if the random salt were excluded, I would think that this is much more secure. Existing rainbow tables would not be anywhere near as helpful, and attempts to generate a rainbow table for a specific salted database would be ineffective because the salt changes on a per-user basis.

stephen_g · on June 6, 2012

The solution is to use a better method of storing passwords. Hashes like SHA1 are designed to be really fast (great for hashing data but also great if you want to brute force).

I think this is a pretty good overview: http://codahale.com/how-to-safely-store-a-password/

toyg · on June 6, 2012

Then the password has to be updated whenever your email changes. I believe Amazon does it like that, literally "forking" whenever you change password; at one point it was possible to simply log on with the old password and live an "alternate reality" where all changes you'd done after changing pwd had not been applied. Don't know if it's still the case today.

niekmaas · on June 6, 2012

Why would you use the email? Mostly when passwords/usernames are stolen the email is there too. For my site I have an unique 128-bit token for every user. I also have a 128-bit site_key (which is in the application, not db) and mix those with the password and then hash.

vladd · on June 6, 2012

The economics of password crackers changed and rainbow tables are pretty much obsolete nowadays. See http://www.codinghorror.com/blog/2012/04/speed-hashing.html section "What about rainbow tables?".

mkjones · on June 6, 2012

Interesting - I wasn't able to find the hashes of any passwords in the list. What list were you using?

madsr · on June 6, 2012

The rar with ~100k cracked passwords in it. If you tried to find your own, perhaps you're one of the ~144 million accounts that wasn't published?

Edit: I'm not sure I understand what you mean - there was 100k passwords in one file, already cracked, and another with all 6.5M hashes. I found my hash in the hashes file.

mkjones · on June 6, 2012

Ah, I have the 6.5M file. Not sure why I'm not finding stuff from my wordlist in it, but I do see things from e.g. https://twitter.com/mikko/status/210341669944573955. Sorry for the confusion!

Scorpion · on June 6, 2012

Oddly, mine isn't in the leak despite the fact that I just logged in with it.

kevindication · on June 6, 2012

LinkedIn could easily match each hash to a user. Then they should lock each of those accounts and force them to change their password.

carbocation · on June 6, 2012

Which should be done, but which doesn't help those users where it matters most; the real value of this database is that some people (~everyone) reuses passwords across sites.

kevindication · on June 6, 2012

And send them a note too, sure. They've got their e-mail addresses as well so a note of apology and warning is certainly in order.

mibbitier · on June 6, 2012

From the looks of it, the data dump may be all accounts - since there seems to be no salt, and many people use same passwords...

rbanffy · on June 6, 2012

You can use it for checking whether your password was leaked. You don't need usernames for that.

TomGullen · on June 6, 2012

Are the hashed passwords not salted?

ithkuil · on June 6, 2012

You can perform this check even if they were salted.

Otherwise how could linkedin check if you correctly entered your password?

The salt is contained in cleartext as part of the hashed password, so that you can repeat the hashing the secret and match the two hashes.

The salt improves the security because:

1. even if two users use the same password, you cannot tell that by simply comparing the hashes

2. makes brute force checks much slower because you have to recompute the hash for every hashed password entry rather than once for every dictionary entry

3. Prevents building rainbow tables

(probably other reasons, I'm not a crypto expert)

leftnode · on June 6, 2012

The salt may have been stored in a separate database table and not distributed with this list (if they were salted, which apparently they aren't).

thaumaturgy · on June 6, 2012

No. I was just confirming that myself when I saw madsr's comment: http://news.ycombinator.com/item?id=4073454

_gtly · on June 6, 2012

To get a sense of it, I downloaded it from a link here. Below is the structure of the first few lines. Caveat: it's garbage/useless data below -- I intentionally changed around the actual numbers to give a sense of the structure, only:

000000a94d47b9cb82ca8a3b492a51263b40a66e 000000a98a624314892af97c6f1a0635472eae38 000000a9ba60e7f13fcac444a5a791af7807a3a3 000000a97ea34e74a97a6d1ce08ebc68d3e9aab2 000000a9b4b2a3497aaa51e212ac9efdb00aaf4e

Anderkent · on June 6, 2012

The pattern 000000a9 is just in presentation - I counted the occurrences of different bytes in that position (also misled by the apparent pattern, where many lines in a row would have the same 4th byte), and each possible value is present more or less equally often.

It seems like it's just sha1.

EDIT: however, 3.5 million hashes start with 5 zeroes, which is way too many for just coincidence. Possibly they used multiple hash functions?

ricardobeat · on June 6, 2012

It appears that the publisher just zeroed out the first 4 digits in nearly all the hashes. The rest of the string still matches known hashes.

ominous · on June 6, 2012

Found this on reddit: http://www.reddit.com/r/netsec/comments/unubl/if_it_turns_ou...

My password it not in there, but some people have already reported finding theirs.

jere · on June 6, 2012

Agreed. That seems rather useless. How would that happen anyway? The usernames stored in a different database/table from the hashes?

joelhaasnoot · on June 6, 2012

They might need help cracking the hashes, keeping the usernames behind for their own exploits.

nakkiel · on June 6, 2012

Or they may use this as an advertisment for selling the actual dataset.

hoov · on June 6, 2012

LinkedIn allows you to sign in using any of your verified email addresses, so it seems likely that the usernames are at least stored in a different table.

snorkel · on June 6, 2012

... but still, a head wag at LinkedIn for using weak hashing, which I'm guessing means MD5.

DaveChild · on June 6, 2012

MD5 isn't the issue - it's the lack of salting. Without a salt, almost any hash can be cracked with a rainbow table. With a salt, you'd need to know the salt for each hash, and then generate a new rainbow table, in order to recover the original password.

eropple · on June 6, 2012

This isn't really the issue. The real issue is that MD5 (though these hashes are SHA1, which has the same problem) are too easily computed; they are practically byte-forceable. I don't need a rainbow table to compute hashes when I can slam out millions in short order using a GPU. You have a good point about needing to know the salt, but getting the salt is generally easy because it's usually stored in the same place as the hashes (and this practice is fine, because hiding the salts doesn't improve security significantly on its own).

This is a major reason to use bcrypt.

aidos · on June 6, 2012

The difference is that if it's salted you need to work to get a specific password. Without salting you can test a generated hash (rainbow table) against all 6.9 million hashes at the same time.

Not defending the choice - bcrypt is obviously a much better way to go.

eropple · on June 6, 2012

The thing is, though, that it's trivial to slam through that set of salted passwords. It's like unsecured Wi-Fi versus WEP: "door unlocked" versus "'No Trespassing' sign."

romaniv · on June 6, 2012

Let's forget about bcrypt for a second.

What prevents developers from adding a large DB-wide salt (in addition to normal salt) to every password? Wouldn't that prevent bruteforce attacks regardless of the hashing algorithm?

tptacek · on June 6, 2012

Random nonces have very little to do with what makes SHA1 insecure and bcrypt secure. Developers have a very weird and totally misplaced faith in the ability of random "salts" to secure passwords.

romaniv · on June 6, 2012

We're speaking about a very specific attack here: bruteforce. And I'm speaking about a very specific type of "salt" (which could probably be called something else, since it's not the same as normal unique-per-password salt): large, database-wide string of random bytes.

If every password is padded with such a string before hashing, computing the hash would be slower. Obviously, it would be slower because you would have to process more data. An interesting question is whether this would also make it less parallelizable by the virtue of having more information than would fit into GPU cache.

tptacek · on June 6, 2012

None of this makes much sense to me, sorry. Brute-force password cracking has worked on salted passwords since Alec Muffett released Crack in the early '90s. The amount of extra computational power required to hash a password and a salt is negligible.

The only thing "salts" do is prevent rainbow table precomputation, but it's just a quirk of the late '90s and early '00s that "rainbow tables" ever became a mainstream attack method: one bad Microsoft password hash and a series of bad web applications. Long before the MD4 LANMAN hash was ever released, people were breaking salted Unix passwords with off-the-shelf tools, on much, much slower computers than we have now.

romaniv · on June 6, 2012

Computing a hash on 1MB of data is slower than computing a hash of 6-8 bytes of data. Brute-force attacks are based on trying different passwords and seeing that after being salted they generate the same hash as in the database. Therefore, adding a large string to the password before hashing would force the attacker to hash that string. The question is, can this be pre-computed once or efficiently parallelized?

tptacek · on June 6, 2012

You're advocating creating a 1MB "salt" string to slow down hashes? That's the same as simply iterating your hash function enough times to invoke the block function repeatedly.

Just use bcrypt, scrypt, or PBKDF2. People have already figured this problem out.

romaniv · on June 6, 2012

First, I do not advocate anything here. I asked a question.

Second, working with a large string of bits is the same as recursive hashing only if you can pre-compute some small intermediate state of the hash function for that string independently from the password you're trying to guess. If you can't, you would have to work with the entire string for every new password tried.

tptacek · on June 6, 2012

I answered your question: using a very large "salt" to force a password hash to run more block functions is a bad idea.

Modern password crackers are extremely fast without precomputing anything.

wglb · on June 6, 2012

How much slower would you estimate it being?

romaniv · on June 6, 2012

1MB of data will have 16384 SHA 256 blocks. So that's roughly the slowdown I would expect, minus the time it takes to initialize the algorithm for a particular message.

That's not that interesting by itself, but it is interesting to think about how this would affect computing the hashes on GPUs.

wglb · on June 6, 2012

And how high can you crank the work factor for, say, bcrypt?

brown9-2 · on June 6, 2012

Is there a significant time difference in computing the SHA1 hash of 40 bytes versus say, 128 bytes?

romaniv · on June 6, 2012

128 bytes is not "large". I was thinking more along the lines of megabyte+. There is no question that it will slow down hash computations, because you would need to process more data. The question is, can you efficiently parallellize this in a commodity hardware (GPUs)?

jlgreco · on June 6, 2012

To be clear, MD5 (or SHA1 as these apparently are) is a problem. Passwords should be stored using a cryptographic hash function that is designed to hash passwords (read: be slow), not a generic cryptographic hash function (which are designed to be fast). This is exactly the problem that bcrypt was created to solve (among others).

jere · on June 6, 2012

I think people are missing the point that SHA2 is light years ahead of MD5. MD5 has had known security flaws for years.

>Do not use the MD5 algorithm Software developers, Certification Authorities, website owners, and users should avoid using the MD5 algorithm in any capacity.

http://www.kb.cert.org/vuls/id/836068

This is from over 3 years ago.

tptacek · on June 6, 2012

The security differences between SHA2 and MD5 are irrelevant to the matter at hand. If they were MD5 hashes they'd be broken approximately as quickly and in exactly the same way.

jlgreco · on June 6, 2012

The primary problem with using either as a password hash is their speed.

jere · on June 6, 2012

I agree, but my point is that the "use bcrypt" drum has only been beating for a couple years to my knowledge: http://codahale.com/how-to-safely-store-a-password/

Wind the clock back 3-5 years and it's still stupid to use MD5. I could kind of understand some old code laying around that was less secure.

veemjeem · on June 6, 2012

Still, it doesn't matter. As long as one can generate a rainbow table for the hash function, then password lookups will be a O(1) operation. The rainbow table for md5 is moderately small, sha1 is bigger, and I'm sure sha2 is even bigger than the sha1 table.

jere · on June 6, 2012

I'm discussing SHA-2 vs MD5. I wouldn't use any hash function without a salt.... which makes the discussion of rainbow tables irrelevant.