Skip to content

ConjureCollections uses forEach where possible #2524

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

schlosna
Copy link
Contributor

@schlosna schlosna commented May 6, 2025

Before this PR

ConjureCollections used iterator based for loops

After this PR

==COMMIT_MSG==
ConjureCollections uses Iterable#forEach where possible to minimize allocations and speed up iteration over collections.
==COMMIT_MSG==

Possible downsides?

@changelog-app
Copy link

changelog-app bot commented May 6, 2025

Generate changelog in changelog/@unreleased

What do the change types mean?
  • feature: A new feature of the service.
  • improvement: An incremental improvement in the functionality or operation of the service.
  • fix: Remedies the incorrect behaviour of a component of the service in a backwards-compatible way.
  • break: Has the potential to break consumers of this service's API, inclusive of both Palantir services
    and external consumers of the service's API (e.g. customer-written software or integrations).
  • deprecation: Advertises the intention to remove service functionality without any change to the
    operation of the service itself.
  • manualTask: Requires the possibility of manual intervention (running a script, eyeballing configuration,
    performing database surgery, ...) at the time of upgrade for it to succeed.
  • migration: A fully automatic upgrade migration task with no engineer input required.

Note: only one type should be chosen.

How are new versions calculated?
  • ❗The break and manual task changelog types will result in a major release!
  • 🐛 The fix changelog type will result in a minor release in most cases, and a patch release version for patch branches. This behaviour is configurable in autorelease.
  • ✨ All others will result in a minor version release.

Type

  • Feature
  • Improvement
  • Fix
  • Break
  • Deprecation
  • Manual task
  • Migration

Description

ConjureCollections uses Iterable#forEach where possible to minimize allocations and speed up iteration over collections.

Check the box to generate changelog(s)

  • Generate changelog entry

Comment on lines +78 to +79
Preconditions.checkNotNull(elementsToAdd, "elementsToAdd cannot be null")
.forEach(addTo::add);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the default forEach on Iterable does

    default void forEach(Consumer<? super T> action) {
        Objects.requireNonNull(action);
        for (T t : this) {
            action.accept(t);
        }
    }

So you know for which iterable types you expect this to be an improvement? (I've found a couple, for it's not always implemented differently)

(I still think it's better because in the worst case, it will be the same, but I don't believe this will necessarily net us benefits in most cases)

Comment on lines 148 to +149
List<T> arrayList = newList(iterable);
for (T item : arrayList) {
Preconditions.checkNotNull(item, "iterable cannot contain null elements");
}

arrayList.forEach(ConjureCollections::checkNotNullElement);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: ArrayList overrides forEach and does not allocate an iterator, so even though we already loop over the contents in newList(iterable), in the worst case, this won't allocate two iterators

if (iterable instanceof Collection) {
return new ArrayList<>((Collection<T>) iterable);
if (iterable instanceof Collection<? extends T> collection) {
return new ArrayList<>(collection);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR, but more for my understanding/knowledge, since we're looking at optimizing allocations: From what I can tell, this might allocate the underlying array twice?
This does

    public ArrayList(Collection<? extends E> c) {
        Object[] a = c.toArray();
        if ((size = a.length) != 0) {
            if (c.getClass() == ArrayList.class) {
                elementData = a;
            } else {
                elementData = Arrays.copyOf(a, size, Object[].class);
            }
        } else {
            // replace with empty array.
            elementData = EMPTY_ELEMENTDATA;
        }
    }

and Collection#toArray says

The returned array will be "safe" in that no references to it are maintained by this collection. (In other words, this method must allocate a new array even if this collection is backed by an array). The caller is thus free to modify the returned array.

Unfortunately, it seems like addAll also does the same 🤔 (so using new ArrayList(collection.size()) and addAll wouldn't help either)

    public boolean addAll(Collection<? extends E> c) {
        Object[] a = c.toArray();
        modCount++;
        int numNew = a.length;
        if (numNew == 0)
            return false;
        Object[] elementData;
        final int s;
        if (numNew > (elementData = this.elementData).length - (s = size))
            elementData = grow(s + numNew);
        System.arraycopy(a, 0, elementData, s, numNew);
        size = s + numNew;
        return true;
    }

Curious if you have any knowledge about this and why we would be allocating the same array twice

Preconditions.checkNotNull(item, "iterable cannot contain null elements");
}

set.forEach(ConjureCollections::checkNotNullElement);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fwiw, LinkedHashSet does not seem to override Iterable's default forEach implementation, so we'll still create the iterator here afaict

Comment on lines +86 to +104
addTo.addAll(new AbstractCollection<>() {
@Override
public Iterator<T> iterator() {
return new NonNullIterator<>(collection.iterator());
}

@Override
public Object[] toArray() {
Object[] array = collection.toArray();
for (Object element : array) {
checkNotNullElement(element);
}
return array;
}

@Override
public int size() {
return collection.size();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In an ideal world I think we'd implement this within specialized non-null collections to avoid the new anonymous wrapper allocation on a per-call basis. Getting there would be a little tricky, because this API is fairly specialized, however it's in the internal package so we can make some assumptions about how it's used.
Probably not a substantial optimization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose another option is to make the assumption that toArray will be used, and explicitly handle that ourselves:

T[] values = elementsToAdd.toArray();
addTo.ensureCapacity[IfPossible](values.length);
for (T element : values) { // no more iterator allocation because we're iterating over an array
    Preconditions.checkNotNull(element, "elementsToAdd cannot contain null elements");
    addTo.add(element);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants