Page135

Hash Functions

A hash function provides encryption using an algorithm and no key. They are called “one-way hash functions” because there is no way to reverse the encryption. A variable-length plaintext is “hashed” into a fixed-length hash value (often called a “message digest” or simply a “hash”). Hash functions are primarily used to provide integrity: if the hash of a plaintext changes, the plaintext itself has changed. Common older hash functions include Secure Hash Algorithm 1 (SHA-1), which creates a 160-bit hash, and Message Digest 5 (MD5), which creates a 128-bit hash. Weaknesses have been found in both MD5 and SHA-1; newer alternatives such as SHA-3 are recommended.

Collisions

Hashes are not unique, because the number of possible plaintexts is far larger than the number of possible hashes. Assume you are hashing documents that are a megabit long with MD5. Think of the documents as strings 1,000,000 bits long, and the MD5 hash has a string 128 bits long. The universe of potential 1,000,000-bit strings is clearly larger than the universe of 128-bit strings. Therefore, more than one document could have the same hash: this is called a collision.

While collisions are always possible (assuming the plaintext is longer than the hash), they should be very difficult to find. Searching for a collision to match a specific plaintext should not be possible to accomplish in a reasonable amount of time.

MD5

MD5 is the Message Digest algorithm 5, created by Ronald Rivest. It is the most widely used of the MD family of hash algorithms. MD5 creates a 128-bit hash value based on any input length. MD5 has been quite popular over the years, but weaknesses have been discovered where collisions could be found in a practical amount of time. MD6 is the newest version of the MD family of hash algorithms, first published in 2008.

Secure Hash Algorithm

Secure Hash Algorithm is the name of a series of hash algorithms. SHA-1 creates a 160-bit hash value. Like MD5, SHA-1 was also found to have weak collision avoidance. SHA-2 followed and includes SHA-224, SHA-256, SHA-384, and SHA-512, named after the length of the message digest each creates.

The search for the next-generation hashing algorithm was announced in the Federal Register in 2007, similar to the AES competition. It was completed in October 2012, and SHA-3 was finalized in August 2015. SHA-3 also includes SHA-224, SHA-256, SHA-384, and SHA-512 (which SHA-2 also includes) and adds two additional modes: SHAKE128 and SHAKE256. The SHAKE modes create a variable-length messages digest, unlike modes such as SHA-512 (which creates a fixed-length 512-bit message digest).

HAVAL

HAVAL (Hash of Variable Length) is a hash algorithm that creates message digests of 128, 160, 192, 224, or 256 bits in length, using 3, 4, or 5 rounds. HAVAL uses some of the design principles behind the MD family of hash algorithms, and is faster than MD5.

Slow Hash Algorithms

Hash algorithms have historically been strong (cryptographically) and fast (CPU-wise): designed for maximum cryptographic strength while using minimal resources. MD5, SHA-1, SHA-2, and SHA-3 were designed to be cryptographically strong and computationally fast. This makes sense when hashing documents but doesn’t make sense when hashing passwords.

Password hashing algorithms should be strong (cryptographically) and slow (CPU-wise). Why? To make the process of password cracking punishingly slow. We will discuss password cracking in Chapter 6, Domain 5: Identity and Access Management, here’s a preview: “While hashes may not be reversed, an attacker may run the hash algorithm forward many times, selecting various possible passwords, and comparing the output to a desired hash, hoping to find a match (and therefore deriving the original password). This is called password cracking.” Examples of slow hash algorithms commonly used for password hashing are bcrypt (based on Blowfish), and scrypt.