Skip to content

Custom DataFrame KtLint ruleset for our users? #1594

@Jolanrensen

Description

@Jolanrensen

As I learned in the talk from @paul-dingemans at Kotlin Dev Day, it's possible to create a custom KtLint ruleset which would allow us to define a recommended style for using DataFrame in cases where it deviates from "ordinary" Kotlin.

There are a couple of places where this would make sense. If you look at our project, you can see we've suppressed KtLint here as well, or disabled a rule in our .editorconfig entirely.

There might be more, but these two seem like good candidates:

DataFrame operation chaining

DataFrame operations often use "intermediate classes" in its DSL to form a sort-of sentence to describe what needs to be done with the data. It's okay to put these intermediate steps on separate lines if it's just a single operation being called on a dataframe:

dataFrame
    .update { colA and colB }
    .where { it > 10 }
    .with { 100 * it }

However, when multiple operations need to happen after another, it's hard to tell where the instruction for one ends and the next begins:

dataFrame
    .update { colA and colB }
    .where { it > 10 }
    .with { 100 * it }
    .split { colC }
    .by(",")
    .into { "colC$it" }

In our examples and tests, we suppress ktlint_standard_chain-method-continuation and write it like this:

dataFrame
    .update { colA and colB }.where { it > 10 }.with { 100 * it }
    .split { colC }.by(",").into { "colC$it" }

If the contents of an invocation are too long, we could still recommend putting continuation on the same line, while requiring a full line break in between operations:

dataFrame
    .update {
        colA and colB
    }.where { 
        it > 10
    }.with { 
        100 * it 
    }
    .split { 
        colC
    }.by(
        ",",
    ).into { 
        "colC$it"
    }

or a mix like:

dataFrame.update { colA and colB }.where { it > 10 }.with { 
        100 * it 
    }
    .split { colC }.by(",").into { 
        "colC$it"
    }

dataFrameOf(header)(values)

Take a look at:

dataFrameOf("firstName", "lastName", "age", "city", "weight", "isHappy")(
    "Alice", "Cooper", 15, "London", 54, true,
    "Bob", "Dylan", 45, "Dubai", 87, true,
    "Charlie", "Daniels", 20, "Moscow", null, false,
    "Charlie", "Chaplin", 40, "Milan", null, true,
    "Bob", "Marley", 30, "Tokyo", 68, true,
    "Alice", "Wolf", 20, null, 55, false,
    "Charlie", "Byrd", 30, "Moscow", 90, true,
)

Writing out all values like this is only possible if we suppress ktlint:standard:argument-list-wrapping. Otherwise, it will be formatted like:

dataFrameOf("firstName", "lastName", "age", "city", "weight", "isHappy")(
    "Alice",
    "Cooper",
    15,
    "London",
    54,
    true,
    "Bob",
    "Dylan",
    45,
    "Dubai",
    87,
    true,
    ...
)

This loses all readability of the function and makes it harder to use. Maybe we could create a custom rule for DataFrameBuilderInvoke0 that could ignore this function or format it like this:

dataFrameOf("firstName", "lastName", "age", "city", "weight", "isHappy")(
    "Alice",   "Cooper",  15, "London", 54,   true,
    "Bob",     "Dylan",   45, "Dubai",  87,   true,
    "Charlie", "Daniels", 20, "Moscow", null, false,
    "Charlie", "Chaplin", 40, "Milan",  null, true,
    "Bob",     "Marley",  30, "Tokyo",  68,   true,
    "Alice",   "Wolf",    20, null,     55,   false,
    "Charlie", "Byrd",    30, "Moscow", 90,   true,
)

Maybe even like this!

dataFrameOf(
    "firstName", "lastName", "age", "city",   "weight", "isHappy",
)(
    "Alice",     "Cooper",   15,    "London", 54,       true,
    "Bob",       "Dylan",    45,    "Dubai",  87,       true,
    "Charlie",   "Daniels",  20,    "Moscow", null,     false,
    "Charlie",   "Chaplin",  40,    "Milan",  null,     true,
    "Bob",       "Marley",   30,    "Tokyo",  68,       true,
    "Alice",     "Wolf",     20,    null,     55,       false,
    "Charlie",   "Byrd",     30,    "Moscow", 90,       true,
)

(Yes, if we recommend people to use dataFrameOf(vararg Pair<String, Column>), this won't be necessary anymore.)

Metadata

Metadata

Assignees

Labels

researchThis requires a deeper dive to gather a better understanding

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions