Skip to content

Incorrect Behavior of Collecting a filtered iterator to a BooleanArray #8505

@tobixdev

Description

@tobixdev

Describe the bug

Collecting to a BooleanArray produces unintuitive results if the upper bound of the iterator is an over estimation. At least I think after looking at the code.

Is this intended behavior? If not, I could try to come up with a fix.

To Reproduce

Tested with Arrow v56.2.0 (via DataFusion 50)

The following test reproduces this:

    #[test]
    fn test_boolean_array_from() {
        let values = vec![Some(true), None, Some(true), Some(false)]
            .into_iter()
            .filter(Option::is_some)
            .collect::<BooleanArray>();
        assert_debug_snapshot!(values, @r"
        BooleanArray
        [
          true,
          true,
          false,
          null,
        ]
        ")
    }

Expected behavior

I'd have expected the following Array (without the null):

        BooleanArray
        [
          true,
          true,
          false,
        ]

Additional context

The result of the "same" operation on an Int64Array:

    #[test]
    fn test_int64_array_from() {
        let values = vec![Some(1), None, Some(2), Some(3)]
            .into_iter()
            .filter(Option::is_some)
            .collect::<Int64Array>();
        assert_debug_snapshot!(values, @r"
        PrimitiveArray<Int64>
        [
          1,
          2,
          3,
        ]
        ")
    }

Metadata

Metadata

Assignees

No one assigned

    Labels

    arrowChanges to the arrow cratebug

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions