Password handling: A primerThunder Raven-Stoker
Apr 8th, 2015
I'm still seeing MD5() and SHA1() being recommended for hashing passwords online. So I thought a primer might help shed some light on a better way to do it.
PLEASE NOTE: This is just intended to be a primer to point you in the right direction towards achieving good practice with user passwords. Please do check everything with your own research to confirm this information to your own satisfaction.
What is a password handling system?
This seems like much too basic a question but I do think that it's worth discussing to clarify a few things.
In practice, a password handling system will accept a user's password, combine it with something called a salt, and then process it in such a way that the password can never (hopefully) be discovered.
At its heart then, a password handling system is something designed to keep secrets. Or rather, just one secret - the user's password. If it succeeds at doing that, then it's a good system. After that it only has one other job - being able to confirm when it has been given the correct password and rejecting everything else.
Please do note that a common mistake for developers to make is thinking that the system itself has to be kept secret too, but this is actually wrong. If a system for keeping secrets requires that itself is also kept secret, it cannot be considered very good at its job.
Is the lock on your front door in public view? Or does it have to be hidden away to function properly? The locks on most people's front doors are out there in plain sight.
Instead, what we are looking for is a system that has been made publicly available. As crazy as that sounds, when such a system can be picked over and scrutinised by security professionals and they can't find any holes in it, then that's a good system.
Therefore, you should not fall into the trap of thinking that creating your own secret system in code will be better than a publicly approved one. However, this is such a common problem, that I wrote this related article
You might also like to confirm that this is true by reading about Auguste Kerckhoffs' Principles
Encryption or hashing?
First we need to look at the distinction between the two.
Encryption is the process of turning an input (say a password in this case) into an unintelligible string of characters. That would seem to serve our purpose initially. However, encryption is a reversible process. This means that an encrypted password can subsequently be decrypted in order to retrieve the original password. Since a password system is designed to keep the user's password secret, it shouldn't provide the ability for that secret to be uncovered. Consequently, encryption is not the best answer for keeping passwords safe.
Nevertheless, another very common mistake that inexperienced developers tend to make is to think that they have to decrypt that unintelligible string so that they can get the original password back. This is so that they can compare it with a password that has just been supplied by someone logging in. This is very wrong. Remember, our system should not be able to discover the secret (the password) because then it's no longer secret. The secret has to remain in only one location: the user's head.
Hashing on the other hand is a one-way process. You provide the input (the password and a salt) to the hashing function and it will similarly produce an unintelligible string of characters. The key difference here is that there should be no way to take that hashed string of characters and reverse engineer it back to the original password. Hashing then is the way to go.
When we have the hash made from the user's original password stored in the database, we can use the hash (not the password) for comparison. What we need to do is construct another hash using the original salt and the newly offered password. If the new hash is the same as the original hash, it means we must have been given the right password by the user, even when we don't know everything that was used to create the original hash.
What is a salt?
I mentioned a salt up there so now it's appropriate to look at what it is. A salt is just a simple string of characters that we add to the user's own password before passing the combined result to the hashing function.
In the most basic sense, a salt's job is to improve the secrecy of the user's password, but it can't be any old string of characters.
Consequently, we need to consider the two requirements that every salt must honour.
Be unique in the system Every password should be given its own unique salt. Even without knowing why just yet, it makes sense if you think about it. If every password had exactly the same salt added to it, well, the salt might as well not be there in the first place. When a salt is unique to just one password in our system it makes it impractical for a hacker to use a rainbow table (or even generate their own).
Whilst not entirely accurate, it's good enough to say that a rainbow table is just a big database that connects possible passwords with their corresponding hashes. If a hacker has the hash from your database, he can use it to find the password that makes that hash.
But when each password is given its own unique salt, then the hacker can only use a rainbow table to crack just the one password. If your application has 1,000 users, that would mean making 1,000 rainbow tables. Not worth the effort when brute force attacking is quicker, cheaper and less effort.
Be random and unpredictable This is necessary because humans are rubbish at being random and unpredictable. This is the primary reason for providing a salt in the first place. Humans tend to favour passwords that are short and easy to remember. Because of this, we can improve the unguessability of a password by joining it with a nice, long and unpredictable string of characters.
Let's just look at that for a moment
<?php $humanPassword = "HelloWorld"; $saltedPassword = "cjkf-e9SF3$%9asq2WQhg!j?77&f" . $humanPassword;
Remember that the salt should be unique for every password, so please don't copy that.
Nevertheless, it should be quite easy to see that the
$saltedPassword is a much more unguessable string that the plain old human one. The unpredictability in the salt is there to improve the unpredictability in the combined password and therefore help cancel out the very predictable password supplied by our human user.
Predictability is what makes dictionary attacks possible. A dictionary attack is basically a way of trying every word in the dictionary as a possible candidate for the password. It gets more complicated than that though. Mixing two words together is easy. Mixing two words and a couple of numbers together is also easy. Mixing two words, a couple of numbers and a predictable salt together is also easy. When a hacker can try millions of combinations within the space of one second, we should do everything we can to not make it easy.
Side note: Since humans are quite predictable, humans should avoid being responsible for creating salts to avoid predictability from creeping into their salt values. As you might imagine, there are a number of ways to generate a good quality salt already available, none of which require human input.
At this point, I would like to illustrate another very common mistake that developers make. Developers often use time as a basis for randomness. For sure, the exact millisecond when a particular user clicks that register button may very well be extremely random. BUT time itself is exceedingly predictable - it progresses in a very linear fashion from one second to the next. If a hacker knows everything about your system except the users' passwords (the only necessary secret), he knows you're using time as a source of randomness and he knows the order in which your users registered. When you use time in your password handling, you're making it so much easier to crack your passwords. Again, resorting to a publicly approved source of randomness and unpredictability for your salts is your best bet.
Since the salt should be unique to every password, we should remember that we need to store the salt somewhere. Storing it with the resulting password hash is the most convenient location but not entirely necessary.
Remember, the only secret that our password handling system must keep is the one provided by our user. That also means that the password handling system should still be nice and strong even when a hacker discovers what the salt is.
Keeping the salt secret is desirable of course but it should not be essential to the effectiveness of the system. We need to be able to assume that even when the bad guys know everything about our system except the actual password our system is still good at keeping that secret safe.
This is the limit of the salt's usefulness though - to make it unproductive to the hacker for generating rainbow tables.
Brute forcing password hashes
If we've now made it impractical for pre-computing a database of passwords and their corresponding hashes, what's left are brute force attacks and, if the salt has been discovered, possibly dictionary attacks. A discovered salt is no longer unpredictable in and of itself, but if the progression from one salt to the next is unpredictable it stops the hacker from devising a formula for cracking more than one password at once. The length of the salt and the randomness of the characters in one individual salt is therefore less important than the unpredictability between one salt and the next.
In simplistic terms, a brute force attack is one where a hacker will try every possible combination of characters as a password until he gets a match on the resulting hash that he's produced. Note that this does not have to be the original password, just one that produces the same hashed output.
We can make this job easier for the hacker by stating on our website that a password should be between, say, 6 and 20 characters, contain at least one number and one upper case letter. This is another case where webmasters often get it wrong.
We definitely should encourage the user to choose a strong password that they haven't used on another web site. We definitely should not restrict the user's choices through validation with the sole exception of specifying a minimum length.
Whenever we validate a user's password against arbitrary rules for anything other than a minimum length, we are revealing part of the formula to a hacker. For instance, if we specify that a user's password must contain at least one number, the hacker is able to immediately discard the billions of possibilities that do not contain a number, making his job quicker and easier.
Encourage good password choices, but don't validate against bad ones. The sole exception being the mimimum length.
For a good password system then, we should use one that, as close as possible, will produce a unique hash for every possible password that our users supply. Doing so would mean that for all practical purposes, a hacker will only be able to generate a matching hash when he has the right password.
Of course, he doesn't try every possible combination by hand. Not when he has superior computing power at his fingertips. A simple, medium spec graphics card has enough processing power to generate millions of password guesses every single second. A dedicated hacker will have more than one graphics card to use though and there are currently articles on the internet where the setup consists of 25 graphics cards, and which is capable of 380 billion password guesses a second.
That's a lot of guesses in no time at all.
For this reason, we need to make the process of generating a hash "computationally expensive".
There are good hashing algorithms and there are bad hashing algorithms. It's important to know what makes a good hashing algorithm.
Our starting point is that the hashing algorithm should be "computationally expensive". What this means in simple terms is that the task of producing the final hash should take so much computational processing as to make it a (relatively) slow old job just to try one possible password.
What we still need to bear in mind though is that the hacker will have more computing power at his disposal than you. Even so, if it's slow on your servers, it will still be relatively slow on the hacker's superior machines. When the algorithm is deliberately slow, the millions of password guesses per second should drop to only thousands of password guesses a second.
This is significant since it will take the hacker much longer to yield password choices for your site.
Keep in mind that our user will (hopefully) remember their password, so our servers should only need to try the hashing process once to confirm that it's the right one. If the process on our servers takes 0.2 seconds, our user should not notice the delay since it only happens when they log in. A hacker, even with superior processing power, will notice the delay if he's not getting millions of password trys per second.
We need a password handling system that is simple, has been open to the public so that it can be widely tested and checked, gets approval from security experts around the world, provides a computationally expensive hashing process and is good at keeping that all important secret.
One thing you can easily check online is that
sha1() are absolutely no good for password hashing.
The reason why is that they are specifically designed to be fast, not slow. And the reason for this is that they are digest hashing algorithms. These let you check whether a document or file has been tampered with. If you were to email a file to another person and you provided the md5 hash of the file, the other person would be able to make their own md5 hash of the file at their end nice and quickly and then compare their md5 to the one that you sent. If there's a match, then it's very unlikely that the file was tampered with along the way.
So we need one that is slow.
The best option out there at the moment seems to be scrypt. This is the hashing algorithm that virtual crypto-currencies such as BitCoin employ. It is very computationally expensive though (not least memory-wise) so it isn't always practical to use it in a web application (Expensive scrypt hashing on a server designed for sending web pages will upset your users by making your web site seem unresponsive and slow).
The commonly accepted minimum standard for password hashing in online applications is a bit of a two way fight between bcrypt and PBKDF2.
Since bcrypt has been studied to death by the professionals, provides a mechanism for tuning how expensive it is (so if you upgrade your servers, you can still make the hashing process "slow"), can automatically provide very good quality salts for you and PHP provides an easy way to use it, this is the one that I would recommend. You can see how easy it is to use in my other article here
Thank you for reading. I hope you found the article useful and informative. Password hashing is just one piece of the security puzzle, but it's still a very important one at that.
If you found this article helpful, please do share a link to it in your blogs and social media feeds so that it might help others too.
In closing, please do be as careful as you can with users' passwords and search out the right advice online. Even though I recommend bcrypt hashing through the PHP password_* functions, I also recommend that you research the points made in my article via your favourite search engine so that you can confirm for yourself that this is true. It is, in no way, intended to be a complete guide to the topic.
Feedback and comments are always welcome.