Why use ArrayList instead of array with append()?

edited March 2019 in Common Questions

Why use ArrayList instead of array with append()?

In the forums, it is often advised to drop the usage of append() in favor to ArrayList.

A legitimate question is: "Why?"...

append()

append() has been designed in the first times of Processing, where usage of objects and casting were avoided, and it was supposed simpler to use than array lists.

Now, given the number of questions around append() in the forums, I am not sure if this goal have been attained. Common errors: using append() without reassigning the value to the array variable, and forgetting the casting needed with arrays of objects.

Moreover, append() is very inefficient. It isn't important with small arrays of primitive types (like int, float, char) but it is important to be aware of the issue.

To better understand the problem, you have to know how append() works. It takes an array and a new element, and returns a new array with this element at the end. To do this, it just creates an new array one unit larger than the given array. Then it copies the content of the old array into the new one. And finally it adds the given element and returns the result.

This results in lot of memory created (you have at least two copies of the array in memory, and often much more when using append() in a loop) and lot of CPU spent in the copy process, and later in the clean up (garbage collection) of the intermediary arrays.

There is nothing that can be done to improve this process, as the constraint of append() is to return a pure Java array, where the field length must be equal to the number of items in the array.

ArrayList

ArrayList actually works in a similar way, with an additional optimization, and an additional inefficiency...

Its back-end storage is also an array, of objects. It also grows the array (ie. making a bigger one and copying the old content there) when needed.

The optimization: it creates the array bigger than needed. The initial size can be indicated in a version of the constructors, when you are sure of the minimal or typical size of your data. It makes the growing only when this size is reached (or nearly), and then it makes a much larger array, to be sure it will allow several adding of items before needing a new resize.

ArrayList uses an heuristic to determine how much the array will grow: too much would be a waste of memory, too little would make too frequent updates. The bigger the array is, the bigger the extra memory is allocated, in general (as it would imply it can grow much more). Details depends on implementation.

The inefficiency: ArrayList stores only objects. So if you need to store primitive types, they have to be wrapped in their respective object (Integer for int, Float for float, Character for char, etc.). These objects add the Object storage overhead to the primitive value.

This results in consuming much more memory than needed, which can be annoying with very large lists. Moreover, getting the primitive value out of the object has still a little speed impact.

Java doesn't offer array lists of primitive values: they would have to give one per type, resulting in lot of code duplication. Some libraries, like ArrayUtils in Apache Commons, offer such structures, if you really need them.

Anyway, array lists are rarely used to store primitives, the most common usage is to store objects, where they shine.

And arrays aren't obsolete: if you know in advance the size of your data, use arrays, they are the most efficient (both in access speed and in memory) and their syntax is simple. You can also make an oversized array and maintain a counter pointing at the last current element of the array.

From array lists to pure arrays

Some functions want arrays, so we can't pass them an array list. A common example is saveStrings().

If you have an array list, it is easy to create an array out of it:

String[] arrayOfStrings = arrayListOfStrings.toArray(new String[arrayListOfStrings.size()]);

Typed collections

A last note: In old Processing sketches (eg. in the archived forums), you might find the old, untyped version of ArrayList. So you have to cast back the value you get to the original type.

Listing in the old way looks like:

// A common convention is to use plural names for arrays and array lists
ArrayList images = new ArrayList(); // Array list of PImage: must add a comment to indicate what is it
image.add(someImage);

// Later...
PImage img = (PImage) images.get(n); // We get a very generic Object, we must cast back to the real type

// Looping
for (int i = 0; i < images.length; i++)
{
  // Get the value at this index
  PImage img = (PImage) images.get(i);
  // use img
}

Before Java 1.5, collections like ArrayList were untyped, meaning you could put anything in them, they store them as Object, the base type of any object in Java. And you needed to cast back (to indicate to the compiler what is the real type) the object you got to be able to use it. In practice, good practice at least, you always store objects of same type in a collection, so it was more an inconvenience than an advantage.

In recent Processing versions you can use generics to indicate what you store in the list. This is the type between angle brackets following the collection type:

// This array list will store only PImage objects. Note that you can put sub-types in them: a PGraphics is OK there!
ArrayList<PImage> images = new ArrayList<PImage>();
image.add(someImage);

// Later...
PImage img = images.get(n); // See mom, no cast!

// If you don't need an index in the loop, you can use the new foreach syntax:
for (PImage img : images) // Combines the above for with the img = images.get() statement
{
  // use img directly
}

Note: some collections like HashMap need two types (eg. key and value). Syntax is: HashMap<String, PVector>.
You can also make collections of collections: HashMap<String, ArrayList>.

Sign In or Register to comment.