Array vs Map lookups in Java

Array vs map, what do you think is fastest for lookups? Arrays and maps are critical elements in a Java programmer’s arsenal of tools. So what’s the most efficient? Of course it’s maps with O(1) search! Or could it be Arrays in some scenarios?

Lot of people claim that Arrays are faster for a smaller number of elements. It is always better to do a performance test so we can confirm this for ourselves.

For the purpose of this exercise we will take a look at HashMaps, HashSets, arrays[] and Arrays.binarySearch. I’ll be using JMH. I did 5 warm up iterations, 5 measurement iterations and only 3 forks for JMH. Also this test was only run on an Asus Laptop (CPU i7-6700HQ), Windows 10 and Oracle JRE 64bit build 1.8.0_241-b07.

JMH is a Java harness for building, running, and analyzing nano/micro/milli/macro benchmarks written in Java and other languages targeting the JVM.

Array vs map : searching for objects by name

First of all we will create a simple object for this exercise.

static class Person {  
    final int age;  
    final String name;  
  
    public Person(int age, String name) {  
        this.age = age;  
        this.name = name;  
    }
  // ... equals, hashCode, toString 
}

For column storage. I’ve utilized a simple class called PersonColumns. I’m storing Person objects as well to reduce extra object allocations for retrieval.

static class PersonColumns {  
    final String[] names;
    final int[] ages;
    final Person[] people;
  
    PersonColumns(int size) {  
        this.names = new String[size];  
        this.ages = new int[size];  
        this.people = new Person[size];  
    }  
  
    Person get(int index) {  
        return people[index];  
    }  
  
    void set(int index, Person person) {  
        names[index] = person.name;  
        ages[index] = person.age;  
        people[index] = person;  
    }  
}

Then I created a BenchmarkState object to use with JMH benchmarks.

@State(Scope.Benchmark)  
public static class BenchmarkState {  
    final Person[] people;  
    final Map<String, Person> namePersonMap;  
    final String[] nameLookups;  
    final PersonColumns personColumns;
    //...

Next, I created 2 methods to create random age and a name.

   public static String name() {  
      return RandomStringUtils.random(10, ALPHA);  
   }  
  
   public static int age() {  
      return RandomUtils.nextInt(20, 41); 
   }

Finally, I programmed a people array and lookup array generator. Lookup array consists of 5 names from people array and 5 random ones. I’ve decided that everything should be random to be fair. Since we are running multiple iterations it will give us a reasonable result.

// Generate people  
IntStream.range(0, PEOPLE).forEach(value -> {  
    Person per = new Person(age(), name());  
    people[value] = per;  
    namePersonMap.put(per.name, per);  
    personColumns.set(value, per);  
});  
System.out.println("People = " + Arrays.toString(people));  
  
// Generate lookups 50% random, 50% from actual people  
IntStream.range(0, LOOKUPS / 2).forEach($ -> lookups.add(name()));  
IntStream.range(0, LOOKUPS / 2).forEach($ -> lookups.add(  
        String.format("%s", people[RandomUtils.nextInt(0, PEOPLE)].name)));  
nameLookups = lookups.toArray(new String[0]);  
ArrayUtils.shuffle(nameLookups); // mix random and actual by shuffling

As for the benchmarks I included below functions.

@Benchmark  
public void searchArray(BenchmarkState x, Blackhole blackhole) {  
    for (final String name: x.nameLookups) {  
        blackhole.consume(findFromArray(x, name));  
    }  
}  
  
public Person findFromArray(BenchmarkState x, String name) {  
    for (Person p : x.people) {  
        if (name.equals(p.name)) {  
            return p;  
        }  
    }  
    return NULL;  
}  
  
@Benchmark  
public void searchWithHash(BenchmarkState x, Blackhole blackhole) {  
    for (final String name: x.nameLookups) {  
        blackhole.consume(fromFromArrayWithHash(x, name));  
    }  
}  
  
public Person fromFromArrayWithHash(BenchmarkState x, String name) {  
    int nameHash = name.hashCode();  
    for (Person p : x.people) {  
        if (nameHash == p.name.hashCode() && name.equals(p.name)) {  
            return p;  
        }  
    }  
    return NULL;  
}  
  
@Benchmark  
public void searchWithHashPersonColumns(BenchmarkState x, Blackhole blackhole) {  
    for (final String name: x.nameLookups) {  
        blackhole.consume(fromHashPersonColumns(x, name));  
    }  
}  
  
public Person fromHashPersonColumns(BenchmarkState x, String name) {  
    int nameHash = name.hashCode();  
    for (int i = 0; i< x.personColumns.names.length; i++) {  
        if (x.personColumns.names[i].hashCode() == nameHash && x.personColumns.names[i].equals(name)) {  
            return x.personColumns.get(i);  
        }  
    }  
    return NULL;  
}  
  
@Benchmark  
public void fromMap(BenchmarkState x, Blackhole blackhole) {  
    for (final String name: x.nameLookups) {  
        blackhole.consume(findFromMap(x, name));  
    }  
}  
  
public Person findFromMap(BenchmarkState x, String name) {  
    return x.namePersonMap.getOrDefault(name, NULL);  
}

Now let’s try this for 10, 20 and 30 elements. Number of lookups will be kept at 10.

As you can see in the image above, array lookups’ performance decrease with the number of elements. I also did a hashCode based array lookup as well. It turned out to be faster than normal string compare based iterative lookup. Map lookup stays at a similar performance. Map is the clear winner in this scenario as we predicted. Even if we search using columns it’s not possible to beat HashMaps.

Array vs map : Searching for objects by ID

I added an final int id to same Person class. Instead of looking for Strings, now we look for an int ID. I also added an int[] ids to PersonColumns.

@State(Scope.Benchmark)  
public static class BenchmarkState {  
    final Person[] sortedPeople;  
    final Person[] unsortedPeople;  
    final PersonColumns sortedPersonColumns;  
    final Map<Integer, Person> namePersonMap;  
    final int[] idLookups;  
    int curId;
    
    public BenchmarkState() {

We will initialize the curId to a random value and increment it randomly.

public int id() {  
    curId = curId + RandomUtils.nextInt(2, 10);  
    return curId;  
}

We can now create Person objects and lookups.

// Generate people  
IntStream.range(0, PEOPLE).forEach(value -> {  
    Person per = new Person(id(), age(), name());  
    sortedPeople[value] = per;  
    namePersonMap.put(per.id, per);  
    sortedPersonColumns.set(value, per);  
});  
System.out.println("People = " + Arrays.toString(sortedPeople));  
  
// Generate lookups 50% random, 50% from actual people  
IntStream.range(0, LOOKUPS / 2).forEach($ -> lookups.add(id()));  
IntStream.range(0, LOOKUPS / 2).forEach($ -> lookups.add(sortedPeople[RandomUtils.nextInt(0, PEOPLE)].id));  
idLookups = lookups.stream().mapToInt(Integer::intValue).toArray();  
ArrayUtils.shuffle(idLookups); // mix random and actual by shuffling  
unsortedPeople = Arrays.copyOf(sortedPeople, sortedPeople.length);  
ArrayUtils.shuffle(unsortedPeople); // We are shuffling people to mix

Next, I’ve written below benchmarks.

@Benchmark  
public void searchArrayNaiveSorted(BenchmarkState x, Blackhole blackhole) {  
    for (final int id: x.idLookups) {  
        blackhole.consume(findFromArrayNaiveSorted(x, id));  
    }  
}  
  
public Person findFromArrayNaiveSorted(BenchmarkState x, int id) {  
    for (Person p : x.sortedPeople) {  
        if (id == p.id) {  
            return p;  
        }  
    }  
    return NULL;  
}  
  
@Benchmark  
public void searchArrayNaiveUnsorted(BenchmarkState x, Blackhole blackhole) {  
    for (final int id: x.idLookups) {  
        blackhole.consume(findFromArrayNaiveUnsorted(x, id));  
    }  
}  
  
public Person findFromArrayNaiveUnsorted(BenchmarkState x, int id) {  
    for (Person p : x.unsortedPeople) {  
        if (id == p.id) {  
            return p;  
        }  
    }  
    return NULL;  
}  
  
@Benchmark  
public void searchBinarySearch(BenchmarkState x, Blackhole blackhole) {  
    for (final int id: x.idLookups) {  
        blackhole.consume(findWithBinarySearch(x, id));  
    }  
}  
  
public Person findWithBinarySearch(BenchmarkState x, int id) {  
    // Compare only IDs since it's unique!  
    int index = Arrays.binarySearch(x.sortedPeople,  
        new Person(id, 0, ""), Comparator.comparingInt(o -> o.id));  
    // this returns `-insertion point - 1` if not found  
    if (index < 0) {  
        return NULL;  
    }  
    return x.sortedPeople[index];  
}  
  
@Benchmark  
public void searchColumnsBinarySearch(BenchmarkState x, Blackhole blackhole) {  
    for (final int id: x.idLookups) {  
        blackhole.consume(findWithColumnsBinarySearch(x, id));  
    }  
}  
  
public Person findWithColumnsBinarySearch(BenchmarkState x, int id) {  
    int index = Arrays.binarySearch(x.sortedPersonColumns.ids, id);  
    if (index < 0) {  
        return NULL;  
    }  
    return x.sortedPersonColumns.get(index);  
}  
  
@Benchmark  
public void searchColumnNaiveSearch(BenchmarkState x, Blackhole blackhole) {  
    for (final int id: x.idLookups) {  
        blackhole.consume(findWithColumnNaiveSearch(x, id));  
    }  
}  
  
public Person findWithColumnNaiveSearch(BenchmarkState x, int id) {  
    for (int i = 0; i < x.sortedPersonColumns.ids.length; i++) {  
        if (x.sortedPersonColumns.ids[i] == id) {  
            return x.sortedPersonColumns.get(i);  
        }  
    }  
    return NULL;  
}  
  
@Benchmark  
public void fromMap(BenchmarkState x, Blackhole blackhole) {  
    for (final int id: x.idLookups) {  
        blackhole.consume(findFromMap(x, id));  
    }  
}  
  
public Person findFromMap(BenchmarkState x, int id) {  
    return x.namePersonMap.getOrDefault(id, NULL);  
}

This is interesting. As we saw in the previous experiment, performance of Map remained roughly constant. However, column based binary search is impressive as it maintains a reasonable performance for up to 20 elements. Advantage of column based approach is that we will only deal with primitive integers and not objects. This gives JVM plenty of opportunity to improve performance.

Why do sorted arrays appear to be slower? What about branch prediction? [1] Since we are returning as soon as we find an element and not processing the whole array, it’s just a matter of finding object faster. Therefore, the branch prediction advantage is lost.

Array vs map : Contains Integer

If you want to find out that an element belongs to a set of elements, we can use either the HashMap.containsKey(key), array linear search, array binary search or the HashSet.contains(key).

// Generate elements  
IntStream.range(0, ELEMENTS).forEach(value -> {  
    int elem = id();  
    elementsUnsorted[value] = elem;  
    elementToTrueMap.put(elem, true);  
    elementSet.add(elem);  
});  
// Generate lookups 50% random, 50% from actual elements  
IntStream.range(0, LOOKUPS / 2).forEach($ -> lookups.add(id()));  
IntStream.range(0, LOOKUPS / 2).forEach($ -> lookups.add(  
        elementsUnsorted[RandomUtils.nextInt(0, ELEMENTS)]));  
elementLookups = lookups.stream().mapToInt(Integer::intValue).toArray();  
ArrayUtils.shuffle(elementLookups); // mix random and actual by shuffling  
elementsSorted = Arrays.copyOf(elementsUnsorted, elementsUnsorted.length);  
ArrayUtils.shuffle(elementsUnsorted); // make it truly unsorted  
  
System.out.println("Elements Unsorted = " + Arrays.toString(elementsUnsorted));  
System.out.println("Elements Sorted = " + Arrays.toString(elementsSorted));  
System.out.println("Elements Lookups = " + Arrays.toString(elementLookups));

Next, I wrote the benchmarks for this scenario.

@Benchmark  
public void searchArraySorted(BenchmarkState x, Blackhole blackhole) {  
    for (final int id: x.elementLookups) {  
        blackhole.consume(findFromArraySorted(x, id));  
    }  
}  
  
public boolean findFromArraySorted(BenchmarkState x, int id) {  
    for (int p : x.elementsSorted) {  
        if (id == p) {  
            return true;  
        }  
    }  
    return false;  
}  
  
@Benchmark  
public void searchArrayUnsorted(BenchmarkState x, Blackhole blackhole) {  
    for (final int id: x.elementLookups) {  
        blackhole.consume(findFromArrayUnsorted(x, id));  
    }  
}  
  
public boolean findFromArrayUnsorted(BenchmarkState x, int id) {  
    for (int p : x.elementsUnsorted) {  
        if (id == p) {  
            return true;  
        }  
    }  
    return false;  
}  
  
  
@Benchmark  
public void fromMapGet(BenchmarkState x, Blackhole blackhole) {  
    for (final int id: x.elementLookups) {  
        blackhole.consume(findFromMap(x, id));  
    }  
}  
  
public boolean findFromMap(BenchmarkState x, int id) {  
    return x.elementToTrueMap.getOrDefault(id, false);  
}  
  
@Benchmark  
public void fromMapContainsKey(BenchmarkState x, Blackhole blackhole) {  
    for (final int id: x.elementLookups) {  
        blackhole.consume(findFromMapContainsKey(x, id));  
    }  
}  
  
public boolean findFromMapContainsKey(BenchmarkState x, int id) {  
    return x.elementToTrueMap.containsKey(id);  
}  
  
@Benchmark  
public void fromSet(BenchmarkState x, Blackhole blackhole) {  
    for (final int id: x.elementLookups) {  
        blackhole.consume(findFromSet(x, id));  
    }  
}  
  
public boolean findFromSet(BenchmarkState x, int id) {  
    return x.elementSet.contains(id);  
}  
  
@Benchmark  
public void searchBinarySearch(BenchmarkState x, Blackhole blackhole) {  
    for (final int id: x.elementLookups) {  
        blackhole.consume(findFromBinarySearch(x, id));  
    }  
}  
  
public boolean findFromBinarySearch(BenchmarkState x, int id) {  
    return Arrays.binarySearch(x.elementsSorted, id) > 0;  
}

We see either better or roughly equal throughput for array linear search when compared with HashMap and HashSet for up to 20 elements this time. Another interesting thing to notice is HashMap.containsKey is faster than getting a boolean from a map or even HashSet.contains. Binary search provides good performance up to 30 elements. However by 40 elements we can see that HashMap.containsKey is a clear winner.

Conclusion

For this test I covered a lot of concepts. However, there can be many more things we can try.

We can conclude that for my machine, a smaller number of elements (10 or less) can be put into arrays, and we can expect a reasonable performance. Column based searching is also an interesting concept that can give us performant results by allowing lookups to happen on primitive arrays. For Strings – even for a smaller number of elements, HashMap is faster.

Full source code is available at GitHub.

What do you think? Please leave a comment below.

If you enjoyed this post, check out this article on Java Reflection. We have tutorials on Python, Java and other coding tips you’ll love!

References

[1] “java – Why is processing a sorted array faster than processing an unsorted array?,” Stack Overflow. [Online]. Available: https://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-processing-an-unsorted-array. [Accessed: 15-Mar-2020]

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.