Hash Function Comprehension

RoyalTech · February 2012

I'm about a week away from taking the N+ exam and was having a hell of a time understanding hash functions. The books I'm reading gave me the impression that they were used to encrypt the data that was sent and I was thinking, what the hell good is it if you need the unencrypted data to check the checksum against? It seemed more or less useless as opposed to simply using an asymmetric algorithm.

I have now read some of the posts on this subject here and I think it all makes sense now. Let me know if these comments about hash functions are correct.

A hash is used simply to verify that the received data is the same as the sent data and is not used to encrypt the data
To actually encrypt the data sent, another method of encryption is needed such as an asymmetric algorithm
Technically, the checksum created by a hash function is unable to guarantee its uniqueness due to its fixed length compared to the number of different types of data that could potentially be sent
It can be used with passwords by a Website or Cisco device which stores the initial checksum and checks it against any future checksum sent by the user instead of having to send the actual password

[FONT=&quot]
[/FONT]
[FONT=&quot]Is my comprehension of this good? I must say that after reading the threads on this topic, I think this site is a Godsend. Thanks
[/FONT]
[FONT=&quot]
[/FONT]

Webmaster · February 2012

RoyalTech wrote:

[FONT=&amp]Is my comprehension of this good? [/FONT]

Excellent, except for one thing, which is that you didn't mention 'the' term hashing is all about: Integrity (summed up in your first bulleted item). And of course you need to know about the popular/common hashing algorithms (like MD5, SHA and RIPEMD).

Good luck next week!

RoyalTech · February 2012

My first comment was meant to imply the concept of integrity. As far as the algorithms go, I was aware of the first two but have never heard of RIPEMD. To be honest, I haven't seen it on any practice exams or the objectives either. Why do you specifically mention this algorithm?

Webmaster · February 2012

RoyalTech wrote: »

As far as the algorithms go, I was aware of the first two but have never heard of RIPEMD. To be honest, I haven't seen it on any practice exams or the objectives either. Why do you specifically mention this algorithm?

Check out the example items in exam objective "6.2 Use and apply appropriate cryptographic tools and products" - it is there.

RoyalTech · February 2012

You are going on the objectives of the SYO-301. I am taking the Network+ exam

RoyalTech · February 2012

Since I'm at it, maybe I can ask you another question that isn't really security related. I'm seeing a question that asks what device "session affinity" is a feature of. Can you fill me in on what session affinity is and how it applies to load balancing because I can't find anything on it.

ChooseLife · February 2012

RoyalTech wrote: »

Since I'm at it, maybe I can ask you another question that isn't really security related. I'm seeing a question that asks what device "session affinity" is a feature of. Can you fill me in on what session affinity is and how it applies to load balancing because I can't find anything on it.

Though you asked Johan, hope you don't mind me answering - jumped in cause it's something I work with on a daily basis.

Session affinity is a term from the load-balancing domain. It generally refers to "stickiness" of a session and indicates how requests having a certain common factor should be handled. In the most common scenario - multiple Web servers behind an LB - it is often desirable to have a single server (rather than many) handle the session initiated by the client. "Common factors" mentioned above can be a source IP or HTTP cookie, or something else, depending on a particular system.

A non-tech example: when you go to a bank to withdraw money, you get into a queue and are serviced by the next available teller. This would be "session affinity set to none". Both in banking and computers, it is done because it is the most efficient way to handle a large number of short and simple requests that are considered done at the end of the interaction.
A contrast example is a visit to a family doctor. One may first arrive to the clinic without any specific considerations as to which doctor to be assigned to. Once assigned, the patient may interact with the same doctor - even if it takes longer to get the follow-up appointment - because the doctor is already familiar with the case and is in a better position to treat the patient than their colleague who would see the patient for the first time. This would be "session affinity set to individual patient"

If for whatever reason the clinic decides to assign the person's family to the same doctor, we can say they have "session affinity set to family"

RoyalTech · February 2012

Thanks ChooseLife! You can always interrupt. I think that's what the purpose of these forums are. I need to think about your post for a little while and let it sink in before I can make a comment. Since I'm falling asleep at my desk, now is not a good time for that. I appreciate your input though. I sure wish I had gotten on this forum a while back. I have a feeling I'll be around for a while whether I'm studying for an exam or not.

RoyalTech · February 2012

The main question I have regarding your description is that I have understood the purpose of load balancing to be for the spreading out of the client request workload. Your definition of session affinity, depending on the common factors used, would seem to go against the purpose of load balancing. For instance, if the common factor used is the location of the requesting client, a heavily populated or heavily trafficked area would bog down one server, leaving another server from a less populated area idle. How do the common factors used prevent such scenarios?

ChooseLife · February 2012

RoyalTech wrote: »

The main question I have regarding your description is that I have understood the purpose of load balancing to be for the spreading out of the client request workload. Your definition of session affinity, depending on the common factors used, would seem to go against the purpose of load balancing. For instance, if the common factor used is the location of the requesting client, a heavily populated or heavily trafficked area would bog down one server, leaving another server from a less populated area idle. How do the common factors used prevent such scenarios?

That is a very good observation. Yes, you are correct in that session affinity goes against the idea of load-balancing. It does indeed, and so session affinity is enabled only when needed - i.e. when having a session handled in a persistent way (by a single server) is more important than distributing the load evenly. Sometimes there are technical requirements (server application not capable of fully handing complete state information from one server over to another) and sometimes there are performance reasons (overhead on the application to load/unload session information).

RoyalTech · February 2012

So, essentially, session affinity requires the technology of load balancing so that all requests start at a single server where it then takes over the process and assigns the request to a specific server(as opposed to a round robin distribution) based on a common factor of the requesting client.

BTW, that's called a carry-on sentence in case you were wondering.

Thanks for the feedback. I get a lot of satisfaction knowing that I am at least making good observations about a concept. To me, that means I'm understanding this stuff at least a little bit.

Darril · February 2012

RoyalTech wrote: »

I'm about a week away from taking the N+ exam and was having a hell of a time understanding hash functions. The books I'm reading gave me the impression that they were used to encrypt the data that was sent and I was thinking, what the hell good is it if you need the unencrypted data to check the checksum against? It seemed more or less useless as opposed to simply using an asymmetric algorithm.

I have now read some of the posts on this subject here and I think it all makes sense now. Let me know if these comments about hash functions are correct.
A hash is used simply to verify that the received data is the same as the sent data and is not used to encrypt the data

To actually encrypt the data sent, another method of encryption is needed such as an asymmetric algorithm

Technically, the checksum created by a hash function is unable to guarantee its uniqueness due to its fixed length compared to the number of different types of data that could potentially be sent

It can be used with passwords by a Website or Cisco device which stores the initial checksum and checks it against any future checksum sent by the user instead of having to send the actual password

Is my comprehension of this good? I must say that after reading the threads on this topic, I think this site is a Godsend. Thanks

Overall, I think you have a good understanding of hashing. Before I mention a small clarification, let me stress that your first bullet and Johan's point are on target - the primary purpose of hashing is integrity.

However, hashing is sometimes referred to as one-way encryption which can add a little confusion. In its simplest terms, a hash is simply a number and a hashing function is a mathemetical algorithm calculated against a string of data (such as a password, message, or file). As long as the original data is the same, the hashing function will always produce the same hash (number). It is called one-way encryption because this number can not be used to reliably reproduce the original data. An MD5 hash of 1234567890ABCDEF1234567890ABCDEF12 could be created from a password of P@ssw0rd, an email message, or a 5 MB file.

In contrast, encryption algoritms are used to cipher data to protect confidentiality (or prevent unauthorized disclosure) but encrypted data can be decrypted. In other words, encryption used for confidentiality is two-way encryption.

All that said, while these concepts are highly relevant to the Security+ exam, I doubt you'll come across them in the Network+ exam.

Good luck.

RoyalTech · February 2012

Darril, Thanks for your comment. As far as its relevancy to the N+ exam, it is more of a case of I just wanted to know and it was bothering the hell out of me. I'm a very detailed type of person and need to know everything in depth to feel any sort of confidence on the subject in general. Simply knowing the definition of something or the how of something isn't good enough for me. I need to know the what, why, etc. Also, the Security+ is what I plan on doing once I take the N+ this week so it will be relevant at that point.

Regarding the use of the term one way encryption, I think that is used for situations like the one that Johan mentioned in another post about a cisco device storing the hash (I think the terms checksum and digest are synonymous) for a password and then simply checking the hash in future logins instead of having the user send the password every time. In this scenario, I could see how it is called one-way encryption although, in the truest definition of the term, it is not. I think I do have a fairly decent understanding of the difference between encryption and hashing at this point, thanks to this forum!

One thing I have read is that the DoD has officially said that they consider MD5 to no longer be a very good hash function and recommend against its further use. I think it has to do with the length of the resulting hash and the possibility of identical hashes for different data. How does the industry look at this? Is it still used regularly or does it seem to be fading out?

Darril · February 2012

RoyalTech wrote: »

Regarding the use of the term one way encryption, I think that is used for situations like the one that Johan mentioned in another post about a cisco device storing the hash (I think the terms checksum and digest are synonymous) for a password and then simply checking the hash in future logins instead of having the user send the password every time. In this scenario, I could see how it is called one-way encryption although, in the truest definition of the term, it is not. I think I do have a fairly decent understanding of the difference between encryption and hashing at this point, thanks to this forum!

Yes, the term one-way encryption is often used when discussing hashing a password and storing the hash instead of the actual password.

And yes, many people have been able to get a better understanding of many different topics thanks to these forums. I applaud the work done by Johan and all the moderators making it so easy for people to share information.

RoyalTech wrote: »

One thing I have read is that the DoD has officially said that they consider MD5 to no longer be a very good hash function and recommend against its further use. I think it has to do with the length of the resulting hash and the possibility of identical hashes for different data. How does the industry look at this? Is it still used regularly or does it seem to be fading out?

Yes, you're correct that the 128-bit MD5 hash is susceptible to identical hashes for different data (also called collisions). For example, if one password gives a hash of 123 and an attacker can guess a different password that also gives a hash of 123, it's a collision. If you want to avoid collisions, use more bits. Other hashing functions using more bits such as SHA-1 with 160 bits or SHA-2 with up to 512 bits are less susceptible to collisions and are replacing MD5 in security applications where collisions are a concern.

However, MD5 is still used regularly because collisions aren't always a concern. For example, many web sites use hashes to help verify the integrity of downloads, and MD5 is used because collisions do not represent a realistic risk.

Imagine you wanted to download the Android Software Development Kit (SDK) from the Android Developers site (Android SDK | Android Developers). One of the biggest risks is that the files you are downloading are infected with malware. The site owner calculated the MD5 hash on a clean file, and posted it. You can calculate the hash on the downloaded file and compare the two. If they are the same, the file is clean. If they are different, the downloaded file has a problem which could indicate it has been infected with a virus.

Is it possible for an attacker to infect the file and also modify it so that it produces the same hash (a collision)? Possibly. However, the amount of time required to do this simply isn't worth it to an attacker. By the time the attacker succeeds, the developer may easily have replaced the original file with an update and the attacker would have to start over.

HTH,

RoyalTech · February 2012

I assumed that MD5 was only a concern in areas where the highest level of security was needed. Otherwise, as you stated, it still works pretty well for most purposes. I didn't think about it being used as a protection against malware. I can definitely see how that would work though and I'm glad you mentioned it as an example.

I'm amazed at the difference in my understanding of both hashes and session affinity with the help of you and the others on this thread. There isn't a book that I pick up that doesn't skip steps in explaining one concept or another. Even when I use multiple books and the web, it is sometimes difficult for me to find an explanation that I am comfortable with. The people who are writing the books or blog articles always make assumptions as they explain something which always leaves something missing in a proper explanation. The other thing that tends to confuse things for me is the fact that there is often multiple terms for the same thing, many of which apply to other concepts that are similar.

Anyways, thanks again! I suspect I'll be a regular here and look forward to the day that I am giving the answers more than I'm asking the questions.

Hash Function Comprehension

Comments