private static final Random RANDOM = new Random();
public static <T> Collection<T> whichCollectionTypeShouldIUse() {
int r = RANDOM.nextInt(8);
return switch(r) {
case 0 -> new ArrayList<>();
case 1 -> new LinkedList<>();
case 2 -> new CopyOnWriteArrayList<>();
case 3 -> new HashSet<>();
case 4 -> new LinkedHashSet<>();
case 5 -> new TreeSet<>();
case 6 -> new ArrayDeque<>();
case 7 -> new PriorityQueue<>();
default -> throw new RuntimeException("I am confused.");
}
}
I realize it’s been a few months since I have written my last blog post, and I need to get back into the habit of providing some good content. To keep things simple, let’s talk about some of the various Collection
classes provided by Java’s standard library. Don’t know which one best fits your use case? Use this guide to help you out!
Here I assume that you have some knowledge of data structures. I do not find value in simply restating the basics that you can find all over the Internet and introductory computer science courses. Deciding which data structure is best is sometimes more difficult than consulting a cheat sheet, and it may take some trial and error depending on the situation. Instead, I’ll give some tips I’ve learned throughout the years to help you make your decision.
ArrayList
This is by far the most common List
implementation I’ve seen used. ArrayList
s are generally quite fast because they are backed by arrays, which are even faster. A very common use case is querying a database, storing the results into an ArrayList
, and then iterating through that list to perform some kind of processing (such as JSON serialization).
You might think, aren’t LinkedList
s faster for inserting elements repeatedly at the end of the list? Actually, they are horrendously slower. With each add a LinkedList
has to allocate memory for the new nodes (including references for the element itself and pointers to next and previous nodes), and the memory is scattered throughout the heap. In contrast, an ArrayList
‘s add operation on average involves little more than setting a value at some index in the underlying array, and in the worst case where the ArrayList
‘s capacity must grow to accommodate more elements, the old array’s block of elements is copied into the new larger array via System.arraycopy
, which is a very fast native call. A LinkedList
requires significantly more memory than an ArrayList
.
So yeah, an ArrayList
should be used in most situations. But there are some situations in which an ArrayList
is not ideal:
- You need to insert or remove elements somewhere other than at the end.
- A multithreaded application needs to iterate through and write to the list at the same time (which would cause a
ConcurrentModificationException
). - The list always has a fixed size (in which case an array would be better).
- The list consistently needs to be sorted after each write.
- The list cannot have duplicates.
LinkedList
I’ve seen LinkedList
s in industry code, but they are not super common. They are rarely used because the vast majority of the time they are a terrible idea. If I were to do a code review and see a LinkedList
get instantiated, the first question that would come to my mind would be, why didn’t he use an ArrayList
?
The seemingly nice thing about LinkedList
s is that insertion and deletion in the middle of the list is rather easy—just update a few pointers, and there is no need to shift a bunch of elements around in memory. If you need to frequently perform insertions and deletions in the middle, then the flexibility of the LinkedList
makes it a great choice. However, in my experience it is rare to see such insertions or deletions in production code; seldom have I found lists not populated from beginning to end correctly the first time, unless I am already dealing with abominable legacy code.
LinkedList
is also a good candidate if you are creating a first-in-first-out (FIFO) queue or last-in-first-out (LIFO) stack; however, ArrayDeque
performs better than LinkedList
in those situations. Use LinkedList
if the data structure must implement List
; prefer ArrayDeque
otherwise.
In multithreaded applications where threads would insert into and remove from the queue simultaneously, use ConcurrentLinkedQueue
or ConcurrentLinkedDeque
instead.
CopyOnWriteArrayList
In stateful web services—where it is typical to cache so many things into RAM—it is easy to inadvertently create the conditions that result in ConcurrentModificationException
s. In a common scenario that is often hard to reproduce on demand or catch in a code review, one thread is making modifications to the list, while another thread is looping through that same list.
A quick fix for this problem might be to switch a list’s type from ArrayList
to CopyOnWriteArrayList
, but there are factors to consider before deciding on that change. A CopyOnWriteArrayList
can be a good idea if:
- The application is multithreaded.
- The list is always small.
- There are much more list traversals than there are writes.
- The writes involve minor changes to the list as opposed to completely overwriting the list.
Other alternatives to fixing ConcurrentModificationException
bugs may be better depending on the situation. If a list needs to be completely overwritten, you could create and populate a temporary list and then swap out the reference to the old list with a reference to the new one.
HashSet
HashSet
s are awesome! If you’ve ever used a HashSet
before, you know what I mean. The fast, constant-time lookup operations make HashSet
s extremely useful.
Some common applications of HashSet
s are:
- Checking if something is in a blacklist or whitelist
- Checking if a node has already been visited while searching through a graph
- A web service responding with a list of results with neither duplicates nor any regard whatsoever for the result ordering.
The big caveat is that whatever you put into a HashSet
needs to implement equals(Object)
and hashCode()
, remembering to not break the contract that if two objects are equal then they must have the same hash code. If these are not properly implemented, then the HashSet
is as bad as broken, if not worse. To avoid the hassle of writing boilerplate stuff, you could just have Lombok do it for you, via the @EqualsAndHashCode
, @Data
, or @Value
annotations.
LinkedHashSet
An extension of HashSet
, a LinkedHashSet
is similar to a HashSet
, except that it orders its elements in the same order that they were inserted. A LinkedHashSet
is just as awesome as a HashSet
, but LinkedHashSet
is not used nearly as often. When the order does not matter, a HashSet
should be sufficient, and that is almost always the case. If I need the elements to be in insertion order, a LinkedHashSet
is the way to go.
On a few occasions I have received bug reports where a list of something on a page was not in the correct order, or that the order makes no sense. Whenever the investigation had revealed that the list was powered by a HashSet
, the easy fix was to change the HashSet
into a LinkedHashSet
.
TreeSet
When a set needs to be sorted, the seemingly obvious solution is to use a TreeSet
. A nice thing about TreeSet
is that it keeps the tree sorted for you whenever you perform insertions or deletions. TreeSet
also lets you decide how the elements should be sorted, whether it be by their natural ordering or some other comparator of your choice.
In reality, however, I have rarely seen a TreeSet
be used. It is more common to populate an ArrayList
and sort it via Collections.sort(List)
than to use a TreeSet
. The ArrayList
route performs much better if you only have to sort once and you don’t need to modify the list further.1
Prefer TreeSet
over sorting an array or ArrayList
if elements are added or removed often and the set needs to remain sorted across edits. If you need thread safety, consider using ConcurrentSkipListSet
instead.
ArrayDeque
ArrayDeque
was introduced in Java 6 just in time before I began college, so I should have had more exposure to it during college. My Data Structures textbook says something about an “array queue,” and I vaguely remember being lectured about it in class, but I was never asked to implement one. Having not been aware of ArrayDeque
at the time, I ended up settling for the more familiar LinkedList
class whenever it made sense.
Then one day I was implementing a solution to a practice programming problem about “stacking boxes”. It became obvious that I needed a stack data structure, and I quickly discovered the Stack
class in the java.util
package. To make sure the Stack
class was what I thought it was, I consulted the class’s Javadocs in the Java 7 API and stumbled upon the following:
A more complete and consistent set of LIFO stack operations is provided by the
Deque
interface and its implementations, which should be used in preference to this class. For example:
Deque<Integer> stack = new ArrayDeque<Integer>();
2
And that is how I became aware of the ArrayDeque
class. After researching this class some more, the big takeaway I got was to always use ArrayDeque
instead of Stack
; Stack
‘s almost always worthless thread safety makes this decision a no-brainer.
Now, back to array queues. Inserting an element into the beginning of an ArrayList
is really expensive, so the idea behind the array queue is this: shift the “head” of the array when inserting or removing elements at the front. For example, if I add the number 0 to an existing array queue with elements 1 through 5 and a capacity of 8, the queue’s underlying array would be populated as follows:
1 | 2 | 3 | 4 | 5 | – | – | 0 (head) |
As we add more elements and the queue continues to grow, a new larger array is allocated, and the new elements are copied over starting from the head and wrapping around from the end of the array to the beginning.
Expand the above idea to double-ended queues, and you get the ArrayDeque
class. It turns out that ArrayDeque
outperforms LinkedList
at almost every turn. In fact, the Java documentation encourages you to choose that class when working with stacks and queues.
ArrayDeque
(and less so LinkedList
) can be quite useful for the following applications:
- Performing a breadth-first search or depth-first search
- First-come-first-serve
- Making a snake move
- Building a parser
The two big drawbacks of ArrayDeque
is that (1) it doesn’t implement List
, and (2) you can’t use it to insert elements into the middle of the list. If either of those is a problem, consider using a LinkedList
instead.
PriorityQueue
I vaguely remember my data structures professor giving an analogy that illustrates how a priority queue works:
There are two people standing in line at a copying machine, each waiting their turn. The first person in line needs to copy 100 pages. The second person, who arrived much later, only needs to copy one page. The copying machine finishes its job, and now the question is, who should go next? Clearly, it should be the second person. Why make him wait forever for 100 pages to be copied if he only needs to copy one page? Let him quickly copy the one page, then the first person can do the much larger job.
Now, if the copying machine were to operate strictly on a first-come-first-served basis, then the first person would copy his 100 pages first, and the second person would wait.3
In the above example, we know for sure that the priority is based on how many sheets the person needs to copy; how long the person has been standing in line could also factor in to the priority, but the shortness of the time required to complete the second person’s job is clearly carrying a much heavier weight.
Elements in a PriorityQueue
are released from the queue in either their natural ordering or the ordering specified by the Comparator
passed into the constructor. In other words, a PriorityQueue
allows us to prioritize some objects over others (for processing) in some optimal or desired fashion that is not necessarily first-come-first-served.
How is this different from a TreeSet
, you might ask? The difference lies in how these data structures behave, what they are used for, and how the elements are accessed. Simply put, a PriorityQueue
behaves like a queue, and a TreeSet
behaves like a set that is structured as a tree.
I have used PriorityQueue
very few times. The last time I used PriorityQueue
was in implementing the A* search algorithm.
If multiple threads need to access the same queue, use PriorityBlockingQueue
.
References
- “Is it better to use a TreeSet or ArrayList when using a custom comparator”, Stack Overflow. Accessed July 12, 2024 at https://stackoverflow.com/questions/24371204/is-it-better-to-use-a-treeset-or-arraylist-when-using-a-custom-comparator. ↩︎
- “Stack (Java Platform SE 7)”, Oracle. Accessed July 11, 2024 at https://docs.oracle.com/javase/7/docs/api/java/util/Stack.html. ↩︎
- Barker, Cory. Personal communication, 2007. ↩︎