Progress Bar

Article Read Time

1295
Words
7759
Characters
13 min
Read

Before we start, try to forget everything you know about passwords. Rules like “Passwords must include a capital letter and a number” or those little “strength meters” that give you the reassuring green checkmark that your password is impenetrable are dangerously misleading and incomplete. Some of the reasons are mathematical, but a lot of it has to do with human behavior as well. Let’s take a look at the math behind the curtain, and why an understanding of human habits can defeat the most powerful algorithms.

When you type your password and press enter to log in to an account, your web-browser runs code that hashes your password, converting it into a long string of seemingly random letters, numbers, and symbols- from which it is impossible to tell what the original input was. This hash is stored on the server, so each time you log in, the hashes are compared- not your plain password. This way, only you know your password- not even system administrators can see it, and if the server is breached by hackers, they will only see the garbled-looking hash.

The crux of this process is the hash function —the specific mathematical operations that change your password into this gibberish. Hash functions can be quite complex, but the basic idea is something pretty familiar. Let’s say our function is just simple addition, and our input is 2 integers as the ‘password’. You set a ‘password’ of 3 and 4. This is ‘hashed’ to 7 (3 +4 =7). Only this 7 is stored on the server. If a hacker gains access to the server and finds this ‘hash’, he has no way of knowing whether your ‘password’ is 3 and 4, or 1 and 6, or 5 and 2. This is a crucial point: you cannot conclusively determine the original input from the output. Hash functions are one-way. (This is different than encryption, where you can easily regain the original input). Now, this example of addition as a hash function is incomplete- if the server just knows your numbers should equal 7, then any 2 numbers that sum to 7 will work to guess your ‘password.’ This is called having many collisions- when two or more inputs result in the same hashed output. So, we need to add some transformation that results in as few collisions as possible, but still avoid any reversible operations. Let’s try concatenating the original digits, and multiplying the resulting number by our sum. So for 3 and 4, we’d have 3 + 4 = 7, then 7 * 34 = 238. For 5 and 2, we’d have 5 + 2 = 7, then 7 * 52 = 364.
Remember, any guess that we make has to be in the form of 2 numbers, so we’d have to calculate the entire answer for each set of 2 numbers we want to guess. So what two numbers give us the output 1305?

You could probably figure it out eventually with a calculator, or faster by writing a program to guess and check, but it would take some work. This is basically the idea behind modern password hashing. Modern hash functions are highly complex, and take a computer a relatively large amount of energy to calculate. So even when we know what the formula is and what the output hash is, we have to invest such a large amount of computational time to guess that it becomes prohibitive to try. As we’ll explore below, it could take a computer easily thousands of years to guess a password. In fact, it’s been shown that with certain hash functions, the heat-death of the universe would occur before we could calculate all possible hashes with current global computing power.

The only way to “crack” a password is to calculate the hash for a given input, and see if it matches the hash you’re trying to crack- i.e. guess and check. So for example, say the hash function called MD5 takes 3000 CPU cycles to calculate (the actual number varies by formula and computer system), and maybe it takes 10 cycles to compare and see if the hashes are the same… so 3010 cycles to check one hash. If you have a 10 character password, there’s 72¹⁰ possible permutations of lowercase letters, capital letters, digits, and symbols (possibly more depending on what symbols are allow as input)- about 3.75 quintillion possibilities. So with one core of a typical 3GHz CPU and about 3000 cycles per hash, we have 3 billion cycles per second divided by 3000 cycles per hash = 1 million hashes per second. One quintillion is a thousand billion millions, so 3.75 trillion seconds to calculate all possible combinations…which is about 118,833 years of constant CPU time at 1 million hashes per second. We’ll come back to this in a moment.

It might help to understand exactly what happens in a hash formula, and why that takes CPU cycles. The short answer is, it’s just a bunch of math operations. In a modern PC CPU, you have registers which hold 64 bit binary numbers, and the CPU can perform basic logic operations on those, like AND-ing two numbers together, or comparing them for equality. To do that, enough electricity has to be run through the transistors that make up the CPU to cause the electrons to move a “step forward” through the logic gates- similar to how a certain amount of electricity is required to make an electric motor revolve one time. So, a hash formula typically involves doing enough math to require at least a few thousand of these operations.

Most real hash functions are quite intensive, and guarantee unique outputs for each input, etc. But modern CPUs have many cores, and you can also leverage the cloud….so say we have a bunch of cloud servers that totals 100 cores at 3GHz- using our example above of 118,833 years of single-core CPU, that would get us down to 1188 years….say we have a giant server farm with ten-thousand cores, we can get it down to 0.1188 years (about 6 weeks). That’s for every possible hash, of course- i.e. we would have cracked every single possible password. If we were just trying to find a single hash, we might get lucky and crack it in perhaps a measly couple decades.

In reality, we don’t need to guess all the possibilities. Most people use combinations of letters and numbers that relate to their lives- e.g. English words and numbers from their birthdate- so we can vastly reduce the amount of guesses we need to take. For example, we could feed a list of every word in the English language into the popular cracking program Hashcat, and instruct it to calculate hashes for each word followed by the numbers 0 to 999. So rounding up to 200,000 words in English, plus 1000 following numbers for each one is 200,000 *1000 = 200 million possibilities, which would take 200 seconds at 1 million hashes/second. Not too bad.

Of course, most sites have “strength” requirements, like “your password must include at least one number, one capital letter, and one symbol.” Mathematically this is a great idea, but it doesn’t take into account what people actually do, and consequently these “strength” rules play right into hackers’ hands. A rule like this makes it harder for people to make a password they can remember, so we tend to head straight for common patterns and sequences. Most people will just capitalize the first letter of a word, put a common number like 123, 456, or a recent year, on the end, then a symbol that is easy to reach while holding the shift key- typically an exclamation point, at-sign, pound-sign, or asterisk (Yes, the keyboard layout affects how people choose passwords- “asdf” and “qwerty” are some recognizable common patterns). So in practice, these requirements actually make passwords LESS safe- sure, they prevent absolutely terrible passwords like “password”, but overall they make even complex passwords much easier to guess.