Skip to content

writeArrowFeather not working with nested type ? #271

@phodal

Description

@phodal

Hi, in my case, I want to create a arrow file in client side, then pass to server side. But when I just try run writeArrowFeather, will show the IndexOutOfBoundsException issues.

Exception in thread "main" java.lang.IndexOutOfBoundsException: index: 31393, length: 2320 (expected: range(0, 32768))
	at org.apache.arrow.memory.ArrowBuf.checkIndex(ArrowBuf.java:701)
	at org.apache.arrow.memory.ArrowBuf.setBytes(ArrowBuf.java:765)
	at org.apache.arrow.vector.BaseVariableWidthVector.setBytes(BaseVariableWidthVector.java:1244)
	at org.apache.arrow.vector.BaseVariableWidthVector.set(BaseVariableWidthVector.java:1059)
	at org.apache.arrow.vector.VarCharVector.set(VarCharVector.java:255)
	at org.jetbrains.kotlinx.dataframe.io.ArrowWriterImpl$infillVector$1.invoke(ArrowWriterImpl.kt:111)
	at org.jetbrains.kotlinx.dataframe.io.ArrowWriterImpl$infillVector$1.invoke(ArrowWriterImpl.kt:111)
	at org.jetbrains.kotlinx.dataframe.api.ForEachKt.forEachIndexed(forEach.kt:34)
	at org.jetbrains.kotlinx.dataframe.io.ArrowWriterImpl.infillVector(ArrowWriterImpl.kt:111)
	at org.jetbrains.kotlinx.dataframe.io.ArrowWriterImpl.allocateVectorAndInfill(ArrowWriterImpl.kt:197)
	at org.jetbrains.kotlinx.dataframe.io.ArrowWriterImpl.allocateVectorSchemaRoot(ArrowWriterImpl.kt:223)
	at org.jetbrains.kotlinx.dataframe.io.ArrowWriter$DefaultImpls.writeArrowFeather(ArrowWriter.kt:114)
	at org.jetbrains.kotlinx.dataframe.io.ArrowWriterImpl.writeArrowFeather(ArrowWriterImpl.kt:61)
	at org.jetbrains.kotlinx.dataframe.io.ArrowWriter$DefaultImpls.writeArrowFeather(ArrowWriter.kt:125)
	at org.jetbrains.kotlinx.dataframe.io.ArrowWriterImpl.writeArrowFeather(ArrowWriterImpl.kt:61)
	at org.jetbrains.kotlinx.dataframe.io.ArrowWriter$DefaultImpls.writeArrowFeather(ArrowWriter.kt:133)
	at org.jetbrains.kotlinx.dataframe.io.ArrowWriterImpl.writeArrowFeather(ArrowWriterImpl.kt:61)
	at org.jetbrains.kotlinx.dataframe.io.ArrowWritingKt.writeArrowFeather(arrowWriting.kt:89)
	at com.phodal.chapi.arrow.MainKt.main(Main.kt:26)
	Suppressed: java.lang.IllegalStateException: Memory was leaked by query. Memory leaked: (33024)
Allocator(ROOT) 0/33024/264192/9223372036854775807 (res/actual/peak/limit)

		at org.apache.arrow.memory.BaseAllocator.close(BaseAllocator.java:437)
		at org.apache.arrow.memory.RootAllocator.close(RootAllocator.java:29)
		at org.jetbrains.kotlinx.dataframe.io.ArrowWriterImpl.close(ArrowWriterImpl.kt:247)
		at kotlin.jdk7.AutoCloseableKt.closeFinally(AutoCloseable.kt:64)
		at org.jetbrains.kotlinx.dataframe.io.ArrowWritingKt.writeArrowFeather(arrowWriting.kt:88)
		... 1 more

FAILURE: Build failed with an exception.

Here is my demo code with writer and some debug information:

val dataFrame = DataFrame.read("https://raw.githubusercontent.com/phodal-archive/apache-arrow-chapi-demo/master/data/0_codes.json")
dataFrame.schema().print()

val toArrowSchema = dataFrame.columns().toArrowSchema()
println(toArrowSchema.toJson())

dataFrame.writeArrowFeather(File("codes.arrow"))

When i try to debug, in the dataFrame.schema().print(), it will return correct schema:

NodeName: String
Module: String
Type: String
Package: String?
FilePath: String
Fields: *
    TypeType: String
    TypeKey: String
    Modifiers: List<String>
    TypeValue: String?
    Annotations: *
        Name: String
        KeyValues: *
            Key: String
            Value: String


Implements: List<String>
Functions: *
    Name: String
    Package: String?
    ReturnType: String
    Parameters: *
        TypeValue: String
        TypeType: String
    FunctionCalls: *
        Package: String?
        NodeName: String?
        FunctionName: String
        Position:
            StartLine: Int
            StartLinePosition: Int
            StopLine: Int
            StopLinePosition: Int
        Parameters: *
            TypeValue: String
            TypeType: String
        Type: String?
    Position:
        StartLine: Int
        StartLinePosition: Int?
        StopLine: Int
        StopLinePosition: Int?
    LocalVariables: *
        TypeValue: String
        TypeType: String
    IsConstructor: Boolean?
    Annotations: *
        Name: String
        KeyValues: *
            Key: String
            Value: String


Imports: *
    Source: String
    AsName: String
Position:
    StartLine: Int?
    StopLine: Int?
    StartLinePosition: Int?
    StopLinePosition: Int?
Annotations: *
    Name: String

But, in dataFrame.columns().toArrowSchema() the type will be error:

{
  "fields" : [ {
    "name" : "NodeName",
    "nullable" : false,
    "type" : {
      "name" : "utf8"
    },
    "children" : [ ]
  }, {
    "name" : "Module",
    "nullable" : false,
    "type" : {
      "name" : "utf8"
    },
    "children" : [ ]
  }, {
    "name" : "Type",
    "nullable" : false,
    "type" : {
      "name" : "utf8"
    },
    "children" : [ ]
  }, {
    "name" : "Package",
    "nullable" : true,
    "type" : {
      "name" : "utf8"
    },
    "children" : [ ]
  }, {
    "name" : "FilePath",
    "nullable" : false,
    "type" : {
      "name" : "utf8"
    },
    "children" : [ ]
  }, {
    "name" : "Fields",
    "nullable" : true,
    "type" : {
      "name" : "utf8"
    },
    "children" : [ ]
  }, {
    "name" : "Implements",
    "nullable" : true,
    "type" : {
      "name" : "utf8"
    },
    "children" : [ ]
  }, {
    "name" : "Functions",
    "nullable" : true,
    "type" : {
      "name" : "utf8"
    },
    "children" : [ ]
  }, {
    "name" : "Imports",
    "nullable" : true,
    "type" : {
      "name" : "utf8"
    },
    "children" : [ ]
  }, {
    "name" : "Position",
    "nullable" : true,
    "type" : {
      "name" : "utf8"
    },
    "children" : [ ]
  }, {
    "name" : "Annotations",
    "nullable" : true,
    "type" : {
      "name" : "utf8"
    },
    "children" : [ ]
  } ]
}

I lost something?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions