Ideally, the hash function should produce a different integer for any key. That is, if f is the hash function and key1 and key2 are key values then key1!=key2 implies f(key1)!=f(key2). Such a one-to-one hash function is called a perfect hash function. However, such perfect hash functions seldom are possible. The problem is that the only way they can work is if the hash table is VERY big. For example, if the key values were integers, the hash table would have to be about size 4 billion because that is how many different integers there are. (There are some special cases where perfect hash functions can be constructed, but they are rare.)
Barring a perfect hash function, the next best thing is a hash function that distributes the keys fairly uniformly over the range of indexes. This minimizes the possibility that for two key values key1!=key2, f(key1)=f(key2), but this will still happen and when it does, such an event is called a collistion. Needless to say, such collisions are not something you want to happen because if you have already put a key/pointer combination into your hash table, where do you put the second one?
As an example, let us suppose we have an array of size 7, the key values are integers, and the hash function is f(key)=key%7. As is our usual way, we will not draw in the pointers to records but just the key values and do some insertions into our table.
Insert 27. f(27)=27%7=6. Array becomes:
| 27 |
| 9 | 27 |
| 9 | 3 | 27 |
| 112 | 9 | 3 | 27 |
| 112 | 9 | 3 | 19 | 27 |
| 112 | 29 | 9 | 3 | 19 | 27 |
One solution is that if a collision occurs, simply keep moving right until an open spot is found and insert the value there. This is called linear probing. Another is quadratic probing where you look first in position f(key), then in f(key)+12, then in f(key)+22, then in f(key)+32, ... This has the advantage of not grouping all of the things which hash to the same position in close proximity to each other. Another is to use a secondary hash function, say g, so that you first look at f(key), then at g(f(key)), then at g(g(f(key))), ... However, all of them suffer from the same general problem. If this happens too often and the hash table gets near full, then doing delete or find ends up looking through almost the entire hash table whenever what you are looking for is not in the table.
This is why this method of hashing, called a closed hash table is
useful mostly when you expect that the hash table will not be filled
even near to full. In that case, the number of searches you make can be
fairly well minimized and the method works fairly way.
Next