Skip to content

Provide filter overload to replace colGroups(predicate<ColumnGroup>) #1359

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

koperagen
Copy link
Collaborator

Before this PR this was possible:
df.select { colGroups { it.containsColumn("name") } }
but this wasn't:
df.select { colGroups().filter { it.containsColumn("name") }
Because DataColumn<DataRow> is not a ColumnGroup =( So the cast was needed.

With a little compromise now following will be possible:
df.select { colGroups().filter { it.data.containsColumn("name") }

I decided against making complex inheritance hierarchy, so unlike with normal filter, our new ColumnGroupWithPath is not a ColumnGroup/DataColumn itself. It only contains a data: ColumnGroup, same as ColumnWithPath. Inheritance required deep and imo too complex changes!

@koperagen koperagen added this to the 1.0.0-Beta3 milestone Aug 4, 2025
@koperagen koperagen requested a review from Jolanrensen August 4, 2025 15:29
@koperagen koperagen self-assigned this Aug 4, 2025
@koperagen koperagen added the enhancement New feature or request label Aug 4, 2025
@koperagen koperagen force-pushed the filter-column-groups branch from cbf5652 to c32a2ef Compare August 4, 2025 15:40
I prefer composition to inheritance here because changing ColumnWithPath hierarchy seems difficult,
a lot of code assumes only one possible implementation. So for the sake of only one `filter` implementation, let's introduce a simple new class
@koperagen koperagen force-pushed the filter-column-groups branch from c32a2ef to a0a3f44 Compare August 4, 2025 16:19
*/
@Suppress("INAPPLICABLE_JVM_NAME")
@JvmName("filterColumnGroups")
public fun ColumnSet<AnyRow>.filter(predicate: Predicate<ColumnGroupWithPath<*>>): ColumnSet<*> =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this intermediate type is not needed, right?
colGroups {} takes a Predicate<ColumnGroup<*>>; we can duplicate that here:

fun ColumnSet<AnyRow>.filter(predicate: Predicate<ColumnGroup<*>>): ColumnSet<*> =
    colsInternal { columnWithPath ->
        columnWithPath.isColumnGroup() && predicate(columnWithPath)
    }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.isColumnGroup() does an instance check and auto-cast so is safe :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but it won't hurt to have path property, same as ColumnSet<*>.filter does

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now I understand what you're trying to achieve :)
Still, we'll have to take a slightly further look, because I see we have the ColumnGroupWithPathImpl class. Adding a completely separate ColumnGroupWithPath would be confusing at best

Copy link
Collaborator Author

@koperagen koperagen Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As alternative we could rename and expose ColumnGroupWithPathImpl, but

  1. ColumnWithPath is used extensively across whole column selection and ColumnGroupWithPathImpl is important implementation detail
  2. ColumnGroupWithPath is needed only for one filter overload, and is very simple itself - it's only a class with 2 properties.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we don't introduce a new class and just create:

@Suppress("UNCHECKED_CAST")
@get:JvmName("columnAsColumnGroup")
public val <T> ColumnWithPath<DataRow<T>>.column: ColumnGroup<T>
    get() = data as ColumnGroup<T>

@get:JvmName("columnAsFrameColumn")
public val <T> ColumnWithPath<DataFrame<T>>.column: FrameColumn<T>
    get() = data as FrameColumn<T>

public val <T> ColumnWithPath<T>.column: DataColumn<T>
    get() = data

I think the name "column" is easier to discover in the ColumnWithPath class than "data".

We could even try to find a way to hide data in ColumnWithPath, but that probably even won't be necessary.

(btw I tried calling them "data" with @kotlin.internal.HidesMembers, but unfortunately that doesn't seem to work :( )

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, i'll try

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In CS DSL scope for DataColumn<DataRow<*>> is very polluted though =( There are functions named column. IMO totally impossible to find anything just from looking at completion.
image
image

But if person knows what they're looking for, they can use asColumnGroup even without column.

For comparison completion for ColumnGroupWithPath looks clearer:
image

I tried to limit visibility of functions from outer scope with DslMarker, but i'm afraid everything is still visible despite Dsl Scope violation errors.

So i'd go with ColumnGroupWithPath for better discoverability

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants