Skip to content

Commit

Permalink
Refactoring to expose clean metadata and infoset walkers.
Browse files Browse the repository at this point in the history
This was designed to support integration of Daffodil directly
(without intermediate XML or JSON or even EXI) to other
data handling tools, specifically, Apache Drill.

InfosetElement and related InfosetNode traits are now in
runtime1.api package.

InfosetOutputter now has methods which use InfosetElement and
InfosetArray traits as the objects passed to the handler methods. This
improves the API over making DIArray, DISimple, and DIComplex visible.
(Though it is backward incompatible.) Also removed use of DIComplex,
DISimple from most SAPI and JAPI tests. The SAX InfosetOutputter still
downcasts somewhat to the DINode classes.

Added Metadata, ElementMetadata, etc. (also runtime1.api package) which
provide limited exposure to the RuntimeData and CompilerInfo information.

Added MetadataHandler - which is walked by MetadataWalker which can be
called from DataProcessor. Added unit test for metadata and data
walking to core module

Walking the runtime1 metadata is easier than walking the DSOM tree.
And these data fabrics like Apache Drill are interfacing to the
runtime1 specifically, not the schema compiler. It's natural for the
runtime1 metadata structures and data structures to be the ones
driving the interfacing.

The InfosetNode types were always supposed to be the API, the DINodes
the implementation. This solves the issue of what classes should show
through to SAPI and JAPI about the infoset nodes. It should be the
InfosetNode types, not the DINode types.

Furthermore, the InfosetNode types can have methods to access the
needed runtime metadata information needed by walkers,
InfosetOutputters, etc.

These hide our infoset implementation and runtime metdata (RuntimeData
classes) implementions.

Note: Nothing has changed with InfosetInputters, as those are not
needed for Apache Drill integration - which is parse-only.

BlobMethodMixin factored out of InfosetOutputter as a shared
implementation trait for the basic blob implementation.

Added features to SchemaUtils to avoid "tns" prefix
definition (which is now officially frowned upon)

Added isHidden to SequenceRuntimeData. Needed to avoid walking
hidden elements that appear in metadata structures, but aren't
relevant as they do not appear in InfosetOutputter events.

Improved Infoset API access to simple type methods

They now use the DFDL type names, not the underlying implementation
type names.

So a decimal is accessed via getDecimal not getBigDecimal. Improved
java doc accordingly. They throw a predictable exception on conversion
issues.

DEPRECATION/COMPATIBILITY

The InfosetOutputter trait methods have changed signatures.

The types of the arguments have been replaced:

DIArray -> InfosetArray
DISimple -> InfosetSimpleElement
DIComplex -> InfosetComplexElement

This was done to hide the "DIxxx" types as they are
internal and subject to change.

Methods of DISimple that were named for implementation types (like
getBigInt, getBigDecimal, etc.) have been replaced by methods named
for the DFDL types.

These methods are replaced:

getBigDecimal -> getDecimal
getBigInt -> getInteger

These methods also changed names:

dataValueAsString -> getText

This method is new:

getNonNegativeInteger

Some methods have been removed (getStatus).

DAFFODIL-2832
  • Loading branch information
mbeckerle committed Nov 14, 2023
1 parent f1cde32 commit d26a582
Show file tree
Hide file tree
Showing 51 changed files with 1,876 additions and 633 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ sealed abstract class ComplexTypeBase(xmlArg: Node, parentArg: SchemaComponent)

private lazy val smg = {
childrenForTerms.map { xmlChild =>
ModelGroupFactory(xmlChild, this, 1, false)
ModelGroupFactory(xmlChild, this, 1, isHidden = false)
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,8 @@ object TermFactory {
case Some(_) => ElementRef(child, lexicalParent, position)
}
}
case _ => ModelGroupFactory(child, lexicalParent, position, false, nodesAlreadyTrying)
case _ =>
ModelGroupFactory(child, lexicalParent, position, isHidden = false, nodesAlreadyTrying)
}
childTerm
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -202,8 +202,6 @@ trait SchemaComponent
sscd
}

final def sscd = shortSchemaComponentDesignator

/**
* Elements only e.g., /foo/ex:bar
*/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,11 @@ abstract class SequenceTermBase(

def isOrdered: Boolean

/**
* Overridden in sequence group ref
*/
def isHidden: Boolean = false

}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ trait SequenceTermRuntime1Mixin { self: SequenceTermBase =>
fillByteEv,
maybeCheckByteAndBitOrderEv,
maybeCheckBitOrderAndCharsetEv,
isHidden,
)
}

Expand Down Expand Up @@ -82,6 +83,7 @@ trait ChoiceBranchImpliedSequenceRuntime1Mixin { self: ChoiceBranchImpliedSequen
FillByteUseNotAllowedEv,
Maybe.Nope,
Maybe.Nope,
isHidden = false,
)
}

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,223 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.daffodil.core.api

import scala.collection.mutable.ArrayBuffer
import scala.xml.Elem

import org.apache.daffodil.core.util.TestUtils
import org.apache.daffodil.io.InputSourceDataInputStream
import org.apache.daffodil.lib.util._
import org.apache.daffodil.runtime1.api.ChoiceMetadata
import org.apache.daffodil.runtime1.api.ComplexElementMetadata
import org.apache.daffodil.runtime1.api.DFDL.ParseResult
import org.apache.daffodil.runtime1.api.ElementMetadata
import org.apache.daffodil.runtime1.api.InfosetArray
import org.apache.daffodil.runtime1.api.InfosetComplexElement
import org.apache.daffodil.runtime1.api.InfosetElement
import org.apache.daffodil.runtime1.api.InfosetItem
import org.apache.daffodil.runtime1.api.InfosetSimpleElement
import org.apache.daffodil.runtime1.api.Metadata
import org.apache.daffodil.runtime1.api.MetadataHandler
import org.apache.daffodil.runtime1.api.SequenceMetadata
import org.apache.daffodil.runtime1.api.SimpleElementMetadata
import org.apache.daffodil.runtime1.infoset.InfosetOutputter
import org.apache.daffodil.runtime1.processors.DataProcessor

import org.junit.Assert.assertEquals
import org.junit.Assert.assertTrue
import org.junit.Test

class TestMetadataWalking {

def compileAndWalkMetadata(schema: Elem, mh: MetadataHandler): DataProcessor = {
val dp = TestUtils.compileSchema(schema)
assertTrue(!dp.isError)
dp.walkMetadata(mh)
dp
}

def parseAndWalkData(dp: DataProcessor, infosetOutputter: InfosetOutputter)(
data: Array[Byte],
): ParseResult = {
val isdis = InputSourceDataInputStream(data)
val res = dp.parse(isdis, infosetOutputter)
res
}

class GatherMetadata extends MetadataHandler {

private val buf = new ArrayBuffer[Metadata]();

def getResult: Seq[Metadata] = {
val res: Seq[Metadata] = buf.toVector // makes a copy
buf.clear()
res
}

override def simpleElementMetadata(m: SimpleElementMetadata): Unit = buf += m

override def startComplexElementMetadata(m: ComplexElementMetadata): Unit = buf += m

override def endComplexElementMetadata(m: ComplexElementMetadata): Unit = buf += m

override def startSequenceMetadata(m: SequenceMetadata): Unit = buf += m

override def endSequenceMetadata(m: SequenceMetadata): Unit = buf += m

override def startChoiceMetadata(m: ChoiceMetadata): Unit = buf += m

override def endChoiceMetadata(m: ChoiceMetadata): Unit = buf += m
}

class GatherData extends InfosetOutputter {

private val buf = new ArrayBuffer[InfosetItem]

def getResult: Seq[InfosetItem] = {
val res = buf.toVector
reset()
res
}

override def reset(): Unit = { buf.clear() }

override def startDocument(): Unit = {}

override def endDocument(): Unit = {}

override def startSimple(diSimple: InfosetSimpleElement): Unit = { buf += diSimple }

override def endSimple(diSimple: InfosetSimpleElement): Unit = {}

override def startComplex(complex: InfosetComplexElement): Unit = { buf += complex }

override def endComplex(complex: InfosetComplexElement): Unit = { buf += complex }

override def startArray(array: InfosetArray): Unit = { buf += array }

override def endArray(array: InfosetArray): Unit = { buf += array }
}

@Test def testMetadataWalk_DataWalk_01(): Unit = {
val gatherData = new GatherData
val gatherMetadata = new GatherMetadata
val sch = SchemaUtils.dfdlTestSchema(
<xs:include schemaLocation="/org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd"/>,
<dfdl:format ref="ex:GeneralFormat" lengthKind="implicit"/>,
<xs:element name="e1" dfdl:terminator=".">
<xs:complexType>
<xs:sequence dfdl:separator=";" dfdl:terminator=";">
<xs:element name="s1" type="xs:int" dfdl:lengthKind="delimited" maxOccurs="4" minOccurs="0" dfdl:occursCountKind="implicit"/>
</xs:sequence>
</xs:complexType>
</xs:element>,
useTNS = false,
)
val dp = compileAndWalkMetadata(sch, gatherMetadata)
val md = gatherMetadata.getResult
val mdQNames = md.map {
case e: ElementMetadata => e.toQName
case seq: SequenceMetadata => "seq"
case cho: ChoiceMetadata => "cho"
}
assertEquals("Vector(e1, seq, s1, seq, e1)", mdQNames.toString)
val parser: Array[Byte] => ParseResult = parseAndWalkData(dp, gatherData)
val inputData = "5;6;7;8;.".getBytes("utf-8")
val res = parser(inputData)
val infosetItems = gatherData.getResult
val itemQNames = infosetItems.flatMap {
case e: InfosetElement => Seq(e.metadata.toQName)
case e: InfosetArray => Seq(e.metadata.name + "_array")
case _ => Nil
}
assertEquals("Vector(e1, s1_array, s1, s1, s1, s1, s1_array, e1)", itemQNames.toString)
val itemValues = infosetItems.flatMap {
case e: InfosetSimpleElement => Seq(e.getText)
case _ => Nil
}
assertEquals("5678", itemValues.mkString)
}

/**
* Shows that there are no hidden elements to deal with in
* the metadata walk nor the data walk.
*/
@Test def testMetadataWalk_DataWalk_NoHidden(): Unit = {
val gatherData = new GatherData
val gatherMetadata = new GatherMetadata
val sch = SchemaUtils.dfdlTestSchema(
<xs:include schemaLocation="/org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd"/>,
<dfdl:format ref="ex:GeneralFormat" lengthKind="delimited"/>,
Seq(
<xs:group name="len">
<xs:sequence>
<xs:element name="len" type="xs:unsignedInt"
dfdl:outputValueCalc='{ dfdl:valueLength(../s1[1], "bytes") }'/>
</xs:sequence>
</xs:group>,
<xs:element name="e1" dfdl:terminator=".">
<xs:complexType>
<xs:choice dfdl:choiceDispatchKey='{ "ints" }'>
<xs:sequence dfdl:choiceBranchKey="strings"/>
<xs:sequence dfdl:separator=";" dfdl:choiceBranchKey="ints">
<xs:sequence dfdl:hiddenGroupRef="ex:len"/>
<xs:element name="s1" type="xs:int"
dfdl:lengthKind="explicit" dfdl:length="{ ../len }"
maxOccurs="4" minOccurs="0" dfdl:occursCountKind="implicit"/>
</xs:sequence>
</xs:choice>
</xs:complexType>
</xs:element>,
),
useTNS = false,
useDefaultNamespace = false,
elementFormDefault = "unqualified",
)
val dp = compileAndWalkMetadata(sch, gatherMetadata)
val md = gatherMetadata.getResult
val mdQNames = md.map {
case e: ElementMetadata => e.toQName
case seq: SequenceMetadata => "seq"
case cho: ChoiceMetadata => "cho"
}
assertEquals(
"Vector(ex:e1, cho, seq, seq, seq, s1, seq, cho, ex:e1)",
mdQNames.toString,
)
val parser: Array[Byte] => ParseResult = parseAndWalkData(dp, gatherData)
val inputData = "1;5;6;7;8.".getBytes("utf-8")
val res = parser(inputData)
val infosetItems = gatherData.getResult
val itemQNames = infosetItems.flatMap {
case e: InfosetElement => Seq(e.metadata.toQName)
case e: InfosetArray => Seq(e.metadata.name + "_array")
case _ => Nil
}
assertEquals(
"Vector(ex:e1, s1_array, s1, s1, s1, s1, s1_array, ex:e1)",
itemQNames.toString,
)
val itemValues = infosetItems.flatMap {
case e: InfosetSimpleElement => Seq(e.getText)
case _ => Nil
}
assertEquals("5678", itemValues.mkString)
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ import org.junit.Test; object INoWarn2 { ImplicitsSuppressUnusedImportWarning()
import org.apache.daffodil.core.infoset.TestInfoset
import org.apache.daffodil.core.util.TestUtils
import org.apache.daffodil.io.InputSourceDataInputStream
import org.apache.daffodil.runtime1.infoset.InfosetDocument
import org.apache.daffodil.runtime1.api.InfosetDocument
import org.apache.daffodil.runtime1.infoset.NullInfosetOutputter
import org.apache.daffodil.runtime1.processors.DataProcessor
import org.apache.daffodil.runtime1.processors.parsers.PState
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ import org.apache.daffodil.core.util.TestUtils
import org.apache.daffodil.io.InputSourceDataInputStream
import org.apache.daffodil.lib.Implicits.intercept
import org.apache.daffodil.lib.util.SchemaUtils
import org.apache.daffodil.runtime1.api.InfosetSimpleElement
import org.apache.daffodil.runtime1.dpath.NodeInfo
import org.apache.daffodil.runtime1.infoset.DISimple
import org.apache.daffodil.runtime1.infoset.ScalaXMLInfosetInputter
import org.apache.daffodil.runtime1.infoset.ScalaXMLInfosetOutputter

Expand All @@ -37,10 +37,10 @@ import org.junit.Test
*/
class RedactingScalaXMLInfosetOutputter extends ScalaXMLInfosetOutputter {

override def startSimple(diSimple: DISimple): Unit = {
super.startSimple(diSimple)
override def startSimple(se: InfosetSimpleElement): Unit = {
super.startSimple(se)

val runtimeProperties = diSimple.erd.runtimeProperties
val runtimeProperties = se.metadata.runtimeProperties

val redactions = Option(runtimeProperties.get("redact")).map { value => value.split(",") }
if (redactions.isDefined) {
Expand Down
Loading

0 comments on commit d26a582

Please sign in to comment.