Many information security blogs, including this one, have discussed the recent data breach of gossip site Gawker and problems associated with leaked passwords. The story has demonstrated some of the risks associated with password storage. Gawker did store passwords using a form of encryption, but it was a weak algorithm and thus the encrypted data could be cracked. It’s important to remember that you should never simply rely on “encryption” to protect information – that’s sort of like say a bicycle is protected with a combination lock. Some locks are easier to open than others, and if the lock is attached to a weak cable or not properly looped through the frame of the bike, its strength doesn’t even matter.
With passwords, though, another option is available: one-way hashes. A hash function takes an input of data, such as a password, and outputs a value that’s always the same length and format. The algorithm is designed so that it’s easy to calculate a hash, but essentially impossible to reverse the process. Also, slight adjustments to the input drastically change the output value, and the chances of two values leading to the same hash are extremely unlikely. To use another analogy, think of a person’s fingerprint. It’s easy to capture a fingerprint using an ink pad and paper. But if you start with a fingerprint and want to identify the person it came from, you’re at a loss without a database of records to check. And once again, finding two identical fingerprints from two different people would probably never happen.
If an application stores the hash of a password instead of the actual password or a value generated by reversible encryption, then theoretically, the password would remain safe if the database were ever breached. When a user tries to log in, the application simply generates a hash of the supplied password (remember, generating hashes is easy) and compares it against the stored hash. If they match, the user has given the right password. If not, the password is wrong.
Just as people have built databases of human fingerprints, however, databases of hashes exist for common values, so only using a hash would not protect users with simple passwords. Weaknesses have also been found in older hash algorithms, such as MD5. Better options include SHA-1 and the various versions of SHA-2, but they are still not sufficient on their own. Extra protection comes from adding “salt.”
In this context, salt refers to an extra string of random information that’s unique for each saved record. This salt is then concatenated with the password and a hash is generated for the entire new string. The salt needs to be saved along with the hash in the database so that login passwords can still be verified, but it should still be kept secret as much as possible. When a user logs in, their supplied password is concatenated with the salt, hashed, then checked against the stored hash.
With this system, an attacker who manages to break in to the database will only recover salted hashes instead of actual passwords. The nature of hash algorithms means that even if a user had a simple password, the salt helps ensure that their hash won’t match any found in common hash databases. To figure out each password, an attacker would have to compute all possible values with each individual salt, vastly multiplying the amount of computation required.
Of course, just as toothpaste manufacturers remind buyers that their products are only one component of good dental health, salted hashes are only one part of a secure application. In fact, with technologies such as OpenID, OAuth, and Facebook Connect, many sites really don’t even need to handle user passwords any more. But if your application does require its own authentication, a robust implementation of salted hashes ought to be a baseline for password security.