Skip to content
Practical TLS - October Sale
October is my birthday month, which makes for a good excuse to discount my most popular course: Practical TLS.

Use code OctBDAY2024b to get the course for $58 (originally $297).

Purchase here. Or, for more info on the top rated course on TLS, click here.

Code expires Oct 14, 2024.

Hashing Algorithm

    Hashing Algorithm

    This article is a part of a series on Cryptography. Use the navigation boxes to view the rest of the articles.

     

    Cryptography

    The first concept we need to discuss in our exploration of Cryptography is that of a Hashing Algorithm.

    A Hashing Algorithm is a mathematical formula that takes a Message of arbitrary length as input and produces as output a representational sample of the original data.

    For instance, a rudimentary example of a hashing algorithm is simply adding up all the letter values of a particular message. (A=1, B=2, C=3, etc…):

    Simple Hashing Algorithm Example

    The result of a hashing algorithm is called a message Digest (or sometimes Checksum, or Fingerprint). The result of our example hashing on the original message of hello was 52. If someone were to change our original message and process it through the same hashing algorithm, the result would be different:

    Simple Hashing Algorithm on changed message

    By comparing the message digests of each calculation, it is easy to determine that our message has changed.

    Obviously, the Hashing algorithm used in the example is full of flaws. There are many words that when processed through the example algorithm that might result in the same Digest. Had the original Message been changed to cellt, the resulting digest would still be 52, and we would be unaware that the original Message had been altered.

    In reality, a legitimate hashing algorithm must maintain four qualities before it is approved for industry usage:

    1. It is mathematically impossible to extract the original message from the digest.

    It should be impossible to reverse the hashing algorithm and recover the original Message knowing just the resulting Digest. In fact, Hashing is sometimes referred to as one-way encryption: the message can be encrypted but is impossible to decrypt. This is accomplished using one-way functions within the hashing algorithm.

    In a way, our example Hashing algorithm satisfied this condition. It is impossible to derive hello knowing only a resulting digest of 52. Mostly because there could be thousands of messages that result in the identical digest.

    1. A slight change to the original message causes a drastic change in the resulting digest.

    Any minor modification – even as small as changing a single character – to the original Message should greatly alter the computed digest. This is sometimes referred to as the Avalanche effect.

    It is possible because a Hashing algorithm is not simply one calculation. It is a series of calculations, done iteratively over and over. As a result, a small change in the beginning, creates an exponentially bigger and bigger change in the resulting digest. Just like a snowball tumbling down a mountain forming an avalanche.

    1. The result of the hashing algorithm is always the same length.

    It is vital for the resulting Digest to not provide any hints or clues about the original Message – including its length. A digest should not grow in size as the length of the Message increases.

    In our example Hashing algorithm, the longer the word, the bigger the resulting digest would be as we are adding more and more letters together. However in an industry approved hashing algorithm, hashing the word hello would produce a digest the same size as hashing the entire library of congress.

    1. It is infeasible to construct a message which generates a given digest.

    With our example hashing algorithm, if given the digest of 52 , it would not be overly difficult to generate a list of words that might have been the original message. This is what this attribute is trying to prevent.

    In a proper hashing algorithm, this should be infeasible — short of attempting every possible combination of messages until you found a match (aka, brute-forcing the algorithm). But even this becomes infeasible given a large enough digest size.

    In the next article in this series, we will look at exactly how Hashing Algorithms are used to detect modified messages. But for now, we will continue to look at additional aspects of Hashing Algorithms.

    Digest Lengths

    Below is a table with commonly seen, industry recognized hashing algorithms:

    Algorithm Digest Length
    MD5 128 Bits
    SHA or SHA1 160 Bits
    SHA384 384 Bits
    SHA256 256 Bits

    Each of these Hashing algorithms satisfy the four cryptography hashing algorithm properties, as described above. The primary difference between each of them is the size of the resulting digest.

    As with passwords, it is typically considered that a hashing algorithm which results in a longer digest tends to be regarded as more secure.

    Hashing Demonstration with Linux

    To take it a step further, I would like to demonstrate how you can use a hashing algorithm from a standard Linux terminal. Note that if you are unfamiliar with Linux, feel free to skip this section — it is not crucial to learning the aforementioned concepts. If you have at least some Linux familiarity, however, it might help to see these algorithms in action.

    The standard Linux terminal typically comes with at least two of the Hashing Algorithms mentioned above:  MD5 and SHA1. You can use the echo command along with md5sum or sha1sum to run either algorithm on a string of text:

    $ echo "Practical Networking .net" | md5sum
    018aa3ff55842e546c661b7027aed5d7 -
    

    If you typed the exact same command into any Linux terminal, the resulting digest would be the exact same. In fact, if you fed the string Practical Networking .net into any MD5 algorithm, you would see the exact same digest (remember, the echo command also appends a new line character to the string). If we were to change something small, the resulting digest should be completely different. For example, we can capitalize the n in .net and take a look at how the digest changes:

    $ echo "Practical Networking .Net" | md5sum
    6b9298494fb90a1a57efddaee60fbfc1 -
    

    We could also run a much larger sample of text through the MD5 algorithm. We can echo the same string 10,000 times and calculate the md5sum, and notice the resulting digest is still the same length (but the digest is different, of course):

    $ for i in {1..10000}; do echo "Practical Networking.net"; done | md5sum
    938a98abd28c5da2dee6aa07bbc25134 -
    

    Try these same examples using sha1sum. If you don’t have easy access to a Linux shell, you can use a free online Linux terminal.


    Prefer video content to text? The majority of this article has been recorded and can be viewed on Youtube:

    Series NavigationMessage Integrity >>
    5 7 votes
    Article Rating
    Subscribe
    Notify of

    1 Comment
    Oldest
    Newest Most Voted
    Inline Feedbacks
    View all comments

    Good work