ConjureCollections uses forEach where possible #2524

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

schlosna wants to merge 4 commits into develop from davids/forEach

Contributor

schlosna commented May 6, 2025

Before this PR

ConjureCollections used iterator based for loops

After this PR

==COMMIT_MSG==
ConjureCollections uses Iterable#forEach where possible to minimize allocations and speed up iteration over collections.
==COMMIT_MSG==

Possible downsides?


          ConjureCollections uses forEach where possible

6d32889

changelog-app bot commented May 6, 2025 •

edited by schlosna

Loading

Generate changelog in `changelog/@unreleased`

What do the change types mean?

feature: A new feature of the service.
improvement: An incremental improvement in the functionality or operation of the service.
fix: Remedies the incorrect behaviour of a component of the service in a backwards-compatible way.
break: Has the potential to break consumers of this service's API, inclusive of both Palantir services
and external consumers of the service's API (e.g. customer-written software or integrations).
deprecation: Advertises the intention to remove service functionality without any change to the
operation of the service itself.
manualTask: Requires the possibility of manual intervention (running a script, eyeballing configuration,
performing database surgery, ...) at the time of upgrade for it to succeed.
migration: A fully automatic upgrade migration task with no engineer input required.

Note: only one type should be chosen.

How are new versions calculated?

❗The break and manual task changelog types will result in a major release!
🐛 The fix changelog type will result in a minor release in most cases, and a patch release version for patch branches. This behaviour is configurable in autorelease.
✨ All others will result in a minor version release.

Type

Description

ConjureCollections uses Iterable#forEach where possible to minimize allocations and speed up iteration over collections.

Check the box to generate changelog(s)

Generate changelog entry

svc-changelog and others added 3 commits

May 9, 2025 05:06


          Add generated changelog entries

accf4c4


          Bulk add

ea9b77f


          cleanup

abe105f

schlosna added merge when ready autorelease labels

aldexis reviewed

View reviewed changes

conjure-lib/src/main/java/com/palantir/conjure/java/lib/internal/ConjureCollections.java

Comment on lines +78 to +79

		Preconditions.checkNotNull(elementsToAdd, "elementsToAdd cannot be null")
		.forEach(addTo::add);

aldexis May 14, 2025

It seems like the default forEach on Iterable does

    default void forEach(Consumer<? super T> action) {
        Objects.requireNonNull(action);
        for (T t : this) {
            action.accept(t);
        }
    }

So you know for which iterable types you expect this to be an improvement? (I've found a couple, for it's not always implemented differently)

(I still think it's better because in the worst case, it will be the same, but I don't believe this will necessarily net us benefits in most cases)

conjure-lib/src/main/java/com/palantir/conjure/java/lib/internal/ConjureCollections.java

Comment on lines 148 to +149

                       List<T> arrayList = newList(iterable);
-                      for (T item : arrayList) {
-                          Preconditions.checkNotNull(item, "iterable cannot contain null elements");
-                      }
+                      arrayList.forEach(ConjureCollections::checkNotNullElement);

aldexis May 14, 2025

Note to self: ArrayList overrides forEach and does not allocate an iterator, so even though we already loop over the contents in newList(iterable), in the worst case, this won't allocate two iterators

conjure-lib/src/main/java/com/palantir/conjure/java/lib/internal/ConjureCollections.java

-                      if (iterable instanceof Collection) {
-                          return new ArrayList<>((Collection<T>) iterable);
+                      if (iterable instanceof Collection<? extends T> collection) {
+                          return new ArrayList<>(collection);

aldexis May 14, 2025

Not for this PR, but more for my understanding/knowledge, since we're looking at optimizing allocations: From what I can tell, this might allocate the underlying array twice?
This does

    public ArrayList(Collection<? extends E> c) {
        Object[] a = c.toArray();
        if ((size = a.length) != 0) {
            if (c.getClass() == ArrayList.class) {
                elementData = a;
            } else {
                elementData = Arrays.copyOf(a, size, Object[].class);
            }
        } else {
            // replace with empty array.
            elementData = EMPTY_ELEMENTDATA;
        }
    }

and Collection#toArray says

The returned array will be "safe" in that no references to it are maintained by this collection. (In other words, this method must allocate a new array even if this collection is backed by an array). The caller is thus free to modify the returned array.

Unfortunately, it seems like addAll also does the same 🤔 (so using new ArrayList(collection.size()) and addAll wouldn't help either)

    public boolean addAll(Collection<? extends E> c) {
        Object[] a = c.toArray();
        modCount++;
        int numNew = a.length;
        if (numNew == 0)
            return false;
        Object[] elementData;
        final int s;
        if (numNew > (elementData = this.elementData).length - (s = size))
            elementData = grow(s + numNew);
        System.arraycopy(a, 0, elementData, s, numNew);
        size = s + numNew;
        return true;
    }

Curious if you have any knowledge about this and why we would be allocating the same array twice

conjure-lib/src/main/java/com/palantir/conjure/java/lib/internal/ConjureCollections.java

-                          Preconditions.checkNotNull(item, "iterable cannot contain null elements");
-                      }
+                      set.forEach(ConjureCollections::checkNotNullElement);

aldexis May 14, 2025

Fwiw, LinkedHashSet does not seem to override Iterable's default forEach implementation, so we'll still create the iterator here afaict

carterkozak reviewed

View reviewed changes

conjure-lib/src/main/java/com/palantir/conjure/java/lib/internal/ConjureCollections.java

Comment on lines +86 to +104

+                          addTo.addAll(new AbstractCollection<>() {
+                              @Override
+                              public Iterator<T> iterator() {
+                                  return new NonNullIterator<>(collection.iterator());
+                              }
+                              @Override
+                              public Object[] toArray() {
+                                  Object[] array = collection.toArray();
+                                  for (Object element : array) {
+                                      checkNotNullElement(element);
+                                  }
+                                  return array;
+                              }
+                              @Override
+                              public int size() {
+                                  return collection.size();
+                              }

Contributor

carterkozak May 20, 2025

In an ideal world I think we'd implement this within specialized non-null collections to avoid the new anonymous wrapper allocation on a per-call basis. Getting there would be a little tricky, because this API is fairly specialized, however it's in the internal package so we can make some assumptions about how it's used.
Probably not a substantial optimization.

Contributor

carterkozak May 20, 2025

I suppose another option is to make the assumption that toArray will be used, and explicitly handle that ourselves:

T[] values = elementsToAdd.toArray();
addTo.ensureCapacity[IfPossible](values.length);
for (T element : values) { // no more iterator allocation because we're iterating over an array
    Preconditions.checkNotNull(element, "elementsToAdd cannot contain null elements");
    addTo.add(element);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

autorelease merge when ready