Java standard collection types and when to use them

Here I go over some Java collection classes, sharing tips and experiences to help you decide which one to use.
private static final Random RANDOM = new Random();
public static <T> Collection<T> whichCollectionTypeShouldIUse() {
	int r = RANDOM.nextInt(8);
	return switch(r) {
		case 0 -> new ArrayList<>();
		case 1 -> new LinkedList<>();
		case 2 -> new CopyOnWriteArrayList<>();
		case 3 -> new HashSet<>();
		case 4 -> new LinkedHashSet<>();
		case 5 -> new TreeSet<>();
		case 6 -> new ArrayDeque<>();
		case 7 -> new PriorityQueue<>();
		default -> throw new RuntimeException("I am confused.");
	}
}

I realize it’s been a few months since I have written my last blog post, and I need to get back into the habit of providing some good content. To keep things simple, let’s talk about some of the various Collection classes provided by Java’s standard library. Don’t know which one best fits your use case? Use this guide to help you out!

Here I assume that you have some knowledge of data structures. I do not find value in simply restating the basics that you can find all over the Internet and introductory computer science courses. Deciding which data structure is best is sometimes more difficult than consulting a cheat sheet, and it may take some trial and error depending on the situation. Instead, I’ll give some tips I’ve learned throughout the years to help you make your decision.

ArrayList

This is by far the most common List implementation I’ve seen used. ArrayLists are generally quite fast because they are backed by arrays, which are even faster. A very common use case is querying a database, storing the results into an ArrayList, and then iterating through that list to perform some kind of processing (such as JSON serialization).

You might think, aren’t LinkedLists faster for inserting elements repeatedly at the end of the list? Actually, they are horrendously slower. With each add a LinkedList has to allocate memory for the new nodes (including references for the element itself and pointers to next and previous nodes), and the memory is scattered throughout the heap. In contrast, an ArrayList‘s add operation on average involves little more than setting a value at some index in the underlying array, and in the worst case where the ArrayList‘s capacity must grow to accommodate more elements, the old array’s block of elements is copied into the new larger array via System.arraycopy, which is a very fast native call. A LinkedList requires significantly more memory than an ArrayList.

So yeah, an ArrayList should be used in most situations. But there are some situations in which an ArrayList is not ideal:

  • You need to insert or remove elements somewhere other than at the end.
  • A multithreaded application needs to iterate through and write to the list at the same time (which would cause a ConcurrentModificationException).
  • The list always has a fixed size (in which case an array would be better).
  • The list consistently needs to be sorted after each write.
  • The list cannot have duplicates.

LinkedList

I’ve seen LinkedLists in industry code, but they are not super common. They are rarely used because the vast majority of the time they are a terrible idea. If I were to do a code review and see a LinkedList get instantiated, the first question that would come to my mind would be, why didn’t he use an ArrayList?

The seemingly nice thing about LinkedLists is that insertion and deletion in the middle of the list is rather easy—just update a few pointers, and there is no need to shift a bunch of elements around in memory. If you need to frequently perform insertions and deletions in the middle, then the flexibility of the LinkedList makes it a great choice. However, in my experience it is rare to see such insertions or deletions in production code; seldom have I found lists not populated from beginning to end correctly the first time, unless I am already dealing with abominable legacy code.

LinkedList is also a good candidate if you are creating a first-in-first-out (FIFO) queue or last-in-first-out (LIFO) stack; however, ArrayDeque performs better than LinkedList in those situations. Use LinkedList if the data structure must implement List; prefer ArrayDeque otherwise.

In multithreaded applications where threads would insert into and remove from the queue simultaneously, use ConcurrentLinkedQueue or ConcurrentLinkedDeque instead.

CopyOnWriteArrayList

In stateful web services—where it is typical to cache so many things into RAM—it is easy to inadvertently create the conditions that result in ConcurrentModificationExceptions. In a common scenario that is often hard to reproduce on demand or catch in a code review, one thread is making modifications to the list, while another thread is looping through that same list.

A quick fix for this problem might be to switch a list’s type from ArrayList to CopyOnWriteArrayList, but there are factors to consider before deciding on that change. A CopyOnWriteArrayList can be a good idea if:

  • The application is multithreaded.
  • The list is always small.
  • There are much more list traversals than there are writes.
  • The writes involve minor changes to the list as opposed to completely overwriting the list.

Other alternatives to fixing ConcurrentModificationException bugs may be better depending on the situation. If a list needs to be completely overwritten, you could create and populate a temporary list and then swap out the reference to the old list with a reference to the new one.

HashSet

HashSets are awesome! If you’ve ever used a HashSet before, you know what I mean. The fast, constant-time lookup operations make HashSets extremely useful.

Some common applications of HashSets are:

  • Checking if something is in a blacklist or whitelist
  • Checking if a node has already been visited while searching through a graph
  • A web service responding with a list of results with neither duplicates nor any regard whatsoever for the result ordering.

The big caveat is that whatever you put into a HashSet needs to implement equals(Object) and hashCode(), remembering to not break the contract that if two objects are equal then they must have the same hash code. If these are not properly implemented, then the HashSet is as bad as broken, if not worse. To avoid the hassle of writing boilerplate stuff, you could just have Lombok do it for you, via the @EqualsAndHashCode, @Data, or @Value annotations.

LinkedHashSet

An extension of HashSet, a LinkedHashSet is similar to a HashSet, except that it orders its elements in the same order that they were inserted. A LinkedHashSet is just as awesome as a HashSet, but LinkedHashSet is not used nearly as often. When the order does not matter, a HashSet should be sufficient, and that is almost always the case. If I need the elements to be in insertion order, a LinkedHashSet is the way to go.

On a few occasions I have received bug reports where a list of something on a page was not in the correct order, or that the order makes no sense. Whenever the investigation had revealed that the list was powered by a HashSet, the easy fix was to change the HashSet into a LinkedHashSet.

TreeSet

When a set needs to be sorted, the seemingly obvious solution is to use a TreeSet. A nice thing about TreeSet is that it keeps the tree sorted for you whenever you perform insertions or deletions. TreeSet also lets you decide how the elements should be sorted, whether it be by their natural ordering or some other comparator of your choice.

In reality, however, I have rarely seen a TreeSet be used. It is more common to populate an ArrayList and sort it via Collections.sort(List) than to use a TreeSet. The ArrayList route performs much better if you only have to sort once and you don’t need to modify the list further.1

Prefer TreeSet over sorting an array or ArrayList if elements are added or removed often and the set needs to remain sorted across edits. If you need thread safety, consider using ConcurrentSkipListSet instead.

ArrayDeque

ArrayDeque was introduced in Java 6 just in time before I began college, so I should have had more exposure to it during college. My Data Structures textbook says something about an “array queue,” and I vaguely remember being lectured about it in class, but I was never asked to implement one. Having not been aware of ArrayDeque at the time, I ended up settling for the more familiar LinkedList class whenever it made sense.

Then one day I was implementing a solution to a practice programming problem about “stacking boxes”. It became obvious that I needed a stack data structure, and I quickly discovered the Stack class in the java.util package. To make sure the Stack class was what I thought it was, I consulted the class’s Javadocs in the Java 7 API and stumbled upon the following:

A more complete and consistent set of LIFO stack operations is provided by the Deque interface and its implementations, which should be used in preference to this class. For example:

Deque<Integer> stack = new ArrayDeque<Integer>();2

And that is how I became aware of the ArrayDeque class. After researching this class some more, the big takeaway I got was to always use ArrayDeque instead of Stack; Stack‘s almost always worthless thread safety makes this decision a no-brainer.

Now, back to array queues. Inserting an element into the beginning of an ArrayList is really expensive, so the idea behind the array queue is this: shift the “head” of the array when inserting or removing elements at the front. For example, if I add the number 0 to an existing array queue with elements 1 through 5 and a capacity of 8, the queue’s underlying array would be populated as follows:

123450 (head)

As we add more elements and the queue continues to grow, a new larger array is allocated, and the new elements are copied over starting from the head and wrapping around from the end of the array to the beginning.

Expand the above idea to double-ended queues, and you get the ArrayDeque class. It turns out that ArrayDeque outperforms LinkedList at almost every turn. In fact, the Java documentation encourages you to choose that class when working with stacks and queues.

ArrayDeque (and less so LinkedList) can be quite useful for the following applications:

  • Performing a breadth-first search or depth-first search
  • First-come-first-serve
  • Making a snake move
  • Building a parser

The two big drawbacks of ArrayDeque is that (1) it doesn’t implement List, and (2) you can’t use it to insert elements into the middle of the list. If either of those is a problem, consider using a LinkedList instead.

PriorityQueue

I vaguely remember my data structures professor giving an analogy that illustrates how a priority queue works:

There are two people standing in line at a copying machine, each waiting their turn. The first person in line needs to copy 100 pages. The second person, who arrived much later, only needs to copy one page. The copying machine finishes its job, and now the question is, who should go next? Clearly, it should be the second person. Why make him wait forever for 100 pages to be copied if he only needs to copy one page? Let him quickly copy the one page, then the first person can do the much larger job.

Now, if the copying machine were to operate strictly on a first-come-first-served basis, then the first person would copy his 100 pages first, and the second person would wait.3

In the above example, we know for sure that the priority is based on how many sheets the person needs to copy; how long the person has been standing in line could also factor in to the priority, but the shortness of the time required to complete the second person’s job is clearly carrying a much heavier weight.

Elements in a PriorityQueue are released from the queue in either their natural ordering or the ordering specified by the Comparator passed into the constructor. In other words, a PriorityQueue allows us to prioritize some objects over others (for processing) in some optimal or desired fashion that is not necessarily first-come-first-served.

How is this different from a TreeSet, you might ask? The difference lies in how these data structures behave, what they are used for, and how the elements are accessed. Simply put, a PriorityQueue behaves like a queue, and a TreeSet behaves like a set that is structured as a tree.

I have used PriorityQueue very few times. The last time I used PriorityQueue was in implementing the A* search algorithm.

If multiple threads need to access the same queue, use PriorityBlockingQueue.

References

  1. “Is it better to use a TreeSet or ArrayList when using a custom comparator”, Stack Overflow. Accessed July 12, 2024 at https://stackoverflow.com/questions/24371204/is-it-better-to-use-a-treeset-or-arraylist-when-using-a-custom-comparator. ↩︎
  2. “Stack (Java Platform SE 7)”, Oracle. Accessed July 11, 2024 at https://docs.oracle.com/javase/7/docs/api/java/util/Stack.html. ↩︎
  3. Barker, Cory. Personal communication, 2007. ↩︎

Leave a Reply

Your email address will not be published. Required fields are marked *