20. Set Map Intro
See the Java Methods Chapter 20 Powerpoint for a discussion, beginning around page 33.
Set
The interface Set
represents a mathematical set. Implementing classes include HashSet
and TreeSet
. Sets can answer the question “is this item in the set?” using the contains
method. That’s really all a set can do.
Map
The interface Map
represents a mathematical function. The inputs to the function are called “keys” and the outputs of the function are called “values”. Implementing classes include HashMap
and TreeMap
.
A map can answer the question “when I put this item in, what value comes out?” by using the get
method. There is no set
method, but you can change the output value by using put(key, newValue)
.
Hashing vs Trees
When dealing with Sets and Maps one needs to find a value quickly. There are two main ways storing the data: hashing and using a tree.
Tree
A binary tree records “smaller”, “bigger”, or “equal to” by offering a number and two choices at each step of the tree. Searching a tree for your object takes about log(N) steps, where N is the number of objects stored in the tree so far. This is slower than a hash function, and requires the objects to be ordered, but it keeps them in order - which can be very useful.
Advantages of trees: Items are kept in order. Simpler to understand.
Requirements to use a tree: items have to be Comparable
.
Hashing
Hashing gives you a “random-ish” number to associate with the thing you are storing. The number is always the same, so you can store the object at the location given by the number and look in that location (with no searching) later when you want to find the object.
Hashing Example
One simplified example is to take integers and use the remainder after dividing by a medium-sized prime (like 173). This lets someone use an array of length 173 to store numbers that may be much larger. This example puts 30 random numbers from 0 to 100,000 into an array of size 173 at predictable locations.
{
int [] data = new int[173];
for (int k=0; k<30; k++) {
int num = (int)(Math.random()*100000);
data[num%173] = num
}
}
With that code it is very fast to find if a number is in the array. (Just check to see if data[newnum%173] == newnum
.)
Dividing by a prime means smaller numbers will not be able to have short cycles - as a non-example, think what happens with the multiples of 16 when you divide by 128 (which is not prime). The “hashes” of the multiples of 16 would be 16, 32, 64, 80, 96, 112, 0, and then repeat.
Hashing Pros and Cons
Advantages of hashing: Fast. Constant time access.
Disadvantages of hashing: More complex to understand. Mistakes in coding can cause wrong answers a runtime, not just errors. Hashing does not keep the objects in any kind of order.
Requirements to use a hash: none except… if you write your own
equals
or compareTo
methods, you also need to write your own
hashCode
function.