Hashing

If all you are usually interested in is looking up particular values and retrieving data, there is another method called hashing that sometimes can give very good performance. Hashing involves a hash table (an array that holds key values and pointers to the rest of the record) and a hash function that is used to decide where in the hash table the particular key/pointer goes. The hash function is a function that takes key values as input and produces integers that are legal positions in the hash table array. Thus, if the hash table array were an array of length N, a legal hash function for this hash table should always produce an integer which is greater than or equal to 0 and less than N.

Ideally, the hash function should produce a different integer for any key. That is, if f is the hash function and key1 and key2 are key values then key1!=key2 implies f(key1)!=f(key2). Such a one-to-one hash function is called a perfect hash function. However, such perfect hash functions seldom are possible. The problem is that the only way they can work is if the hash table is VERY big. For example, if the key values were integers, the hash table would have to be about size 4 billion because that is how many different integers there are. (There are some special cases where perfect hash functions can be constructed, but they are rare.)

Barring a perfect hash function, the next best thing is a hash function that distributes the keys fairly uniformly over the range of indexes. This minimizes the possibility that for two key values key1!=key2, f(key1)=f(key2), but this will still happen and when it does, such an event is called a collistion. Needless to say, such collisions are not something you want to happen because if you have already put a key/pointer combination into your hash table, where do you put the second one?

As an example, let us suppose we have an array of size 7, the key values are integers, and the hash function is f(key)=key%7.  As is our usual way, we will not draw in the pointers to records but just the key values and do some insertions into our table.

Insert 27. f(27)=27%7=6. Array becomes:
27
Insert 9. f(9)=9%7=2. Array becomes:
9 27
Insert 3. f(27)=3%7=3. Array becomes:
9 3 27
Insert 112. f(112)=112%7=0. Array becomes:
112 9 3 27
Insert 19. f(19)=19%7=5. Array becomes:
112 9 3 19 27
Insert 29. f(27)=29%7=1. Array becomes:
112 29 9 3 19 27
Insert 31. f(31)=31%7=3. Uh-oh. We are now in trouble because the position (3) where we want to put 31 is already occupied by 3. A collision has occurred.

One solution is that if a collision occurs, simply keep moving right until an open spot is found and insert the value there. This is called linear probing. Another is quadratic probing where you look first in position f(key), then in f(key)+12, then in f(key)+22, then in f(key)+32, ... This has the advantage of not grouping all of the things which hash to the same position in close proximity to each other. Another is to use a secondary hash function, say g, so that you first look at f(key), then at g(f(key)), then at g(g(f(key))), ... However, all of them suffer from the same general problem. If this happens too often and the hash table gets near full, then doing delete or find ends up looking through almost the entire hash table whenever what you are looking for is not in the table.

This is why this method of hashing, called a closed hash table is useful mostly when you expect that the hash table will not be filled even near to full. In that case, the number of searches you make can be fairly well minimized and the method works fairly way.
Next

Lynn Ziegler, lziegler@csbsju.edu

W3C Wilbur Checked! Another HTML Validation Site