-
Notifications
You must be signed in to change notification settings - Fork 73
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Refactoring to expose clean metadata and infoset walkers.
This was designed to support integration of Daffodil directly (without intermediate XML or JSON or even EXI) to other data handling tools, specifically, Apache Drill. InfosetElement and related InfosetNode traits are now in runtime1.api package. InfosetOutputter now has methods which use InfosetElement and InfosetArray traits as the objects passed to the handler methods. This improves the API over making DIArray, DISimple, and DIComplex visible. (Though it is backward incompatible.) Also removed use of DIComplex, DISimple from most SAPI and JAPI tests. The SAX InfosetOutputter still downcasts somewhat to the DINode classes. Added Metadata, ElementMetadata, etc. (also runtime1.api package) which provide limited exposure to the RuntimeData and CompilerInfo information. Added MetadataHandler - which is walked by MetadataWalker which can be called from DataProcessor. Added unit test for metadata and data walking to core module Walking the runtime1 metadata is easier than walking the DSOM tree. And these data fabrics like Apache Drill are interfacing to the runtime1 specifically, not the schema compiler. It's natural for the runtime1 metadata structures and data structures to be the ones driving the interfacing. The InfosetNode types were always supposed to be the API, the DINodes the implementation. This solves the issue of what classes should show through to SAPI and JAPI about the infoset nodes. It should be the InfosetNode types, not the DINode types. Furthermore, the InfosetNode types can have methods to access the needed runtime metadata information needed by walkers, InfosetOutputters, etc. These hide our infoset implementation and runtime metdata (RuntimeData classes) implementions. Note: Nothing has changed with InfosetInputters, as those are not needed for Apache Drill integration - which is parse-only. BlobMethodMixin factored out of InfosetOutputter as a shared implementation trait for the basic blob implementation. Added features to SchemaUtils to avoid "tns" prefix definition (which is now officially frowned upon) Added isHidden to SequenceRuntimeData. Needed to avoid walking hidden elements that appear in metadata structures, but aren't relevant as they do not appear in InfosetOutputter events. Improved Infoset API access to simple type methods They now use the DFDL type names, not the underlying implementation type names. So a decimal is accessed via getDecimal not getBigDecimal. Improved java doc accordingly. They throw a predictable exception on conversion issues. DEPRECATION/COMPATIBILITY The InfosetOutputter trait methods have changed signatures. The types of the arguments have been replaced: DIArray -> InfosetArray DISimple -> InfosetSimpleElement DIComplex -> InfosetComplexElement This was done to hide the "DIxxx" types as they are internal and subject to change. Methods of DISimple that were named for implementation types (like getBigInt, getBigDecimal, etc.) have been replaced by methods named for the DFDL types. These methods are replaced: getBigDecimal -> getDecimal getBigInt -> getInteger These methods also changed names: dataValueAsString -> getText This method is new: getNonNegativeInteger Some methods have been removed (getStatus). DAFFODIL-2832
- Loading branch information
Showing
51 changed files
with
1,876 additions
and
633 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
223 changes: 223 additions & 0 deletions
223
daffodil-core/src/test/scala/org/apache/daffodil/core/api/TestMetadataWalking.scala
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,223 @@ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one or more | ||
* contributor license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright ownership. | ||
* The ASF licenses this file to You under the Apache License, Version 2.0 | ||
* (the "License"); you may not use this file except in compliance with | ||
* the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
package org.apache.daffodil.core.api | ||
|
||
import scala.collection.mutable.ArrayBuffer | ||
import scala.xml.Elem | ||
|
||
import org.apache.daffodil.core.util.TestUtils | ||
import org.apache.daffodil.io.InputSourceDataInputStream | ||
import org.apache.daffodil.lib.util._ | ||
import org.apache.daffodil.runtime1.api.ChoiceMetadata | ||
import org.apache.daffodil.runtime1.api.ComplexElementMetadata | ||
import org.apache.daffodil.runtime1.api.DFDL.ParseResult | ||
import org.apache.daffodil.runtime1.api.ElementMetadata | ||
import org.apache.daffodil.runtime1.api.InfosetArray | ||
import org.apache.daffodil.runtime1.api.InfosetComplexElement | ||
import org.apache.daffodil.runtime1.api.InfosetElement | ||
import org.apache.daffodil.runtime1.api.InfosetItem | ||
import org.apache.daffodil.runtime1.api.InfosetSimpleElement | ||
import org.apache.daffodil.runtime1.api.Metadata | ||
import org.apache.daffodil.runtime1.api.MetadataHandler | ||
import org.apache.daffodil.runtime1.api.SequenceMetadata | ||
import org.apache.daffodil.runtime1.api.SimpleElementMetadata | ||
import org.apache.daffodil.runtime1.infoset.InfosetOutputter | ||
import org.apache.daffodil.runtime1.processors.DataProcessor | ||
|
||
import org.junit.Assert.assertEquals | ||
import org.junit.Assert.assertTrue | ||
import org.junit.Test | ||
|
||
class TestMetadataWalking { | ||
|
||
def compileAndWalkMetadata(schema: Elem, mh: MetadataHandler): DataProcessor = { | ||
val dp = TestUtils.compileSchema(schema) | ||
assertTrue(!dp.isError) | ||
dp.walkMetadata(mh) | ||
dp | ||
} | ||
|
||
def parseAndWalkData(dp: DataProcessor, infosetOutputter: InfosetOutputter)( | ||
data: Array[Byte], | ||
): ParseResult = { | ||
val isdis = InputSourceDataInputStream(data) | ||
val res = dp.parse(isdis, infosetOutputter) | ||
res | ||
} | ||
|
||
class GatherMetadata extends MetadataHandler { | ||
|
||
private val buf = new ArrayBuffer[Metadata](); | ||
|
||
def getResult: Seq[Metadata] = { | ||
val res: Seq[Metadata] = buf.toVector // makes a copy | ||
buf.clear() | ||
res | ||
} | ||
|
||
override def simpleElementMetadata(m: SimpleElementMetadata): Unit = buf += m | ||
|
||
override def startComplexElementMetadata(m: ComplexElementMetadata): Unit = buf += m | ||
|
||
override def endComplexElementMetadata(m: ComplexElementMetadata): Unit = buf += m | ||
|
||
override def startSequenceMetadata(m: SequenceMetadata): Unit = buf += m | ||
|
||
override def endSequenceMetadata(m: SequenceMetadata): Unit = buf += m | ||
|
||
override def startChoiceMetadata(m: ChoiceMetadata): Unit = buf += m | ||
|
||
override def endChoiceMetadata(m: ChoiceMetadata): Unit = buf += m | ||
} | ||
|
||
class GatherData extends InfosetOutputter { | ||
|
||
private val buf = new ArrayBuffer[InfosetItem] | ||
|
||
def getResult: Seq[InfosetItem] = { | ||
val res = buf.toVector | ||
reset() | ||
res | ||
} | ||
|
||
override def reset(): Unit = { buf.clear() } | ||
|
||
override def startDocument(): Unit = {} | ||
|
||
override def endDocument(): Unit = {} | ||
|
||
override def startSimple(diSimple: InfosetSimpleElement): Unit = { buf += diSimple } | ||
|
||
override def endSimple(diSimple: InfosetSimpleElement): Unit = {} | ||
|
||
override def startComplex(complex: InfosetComplexElement): Unit = { buf += complex } | ||
|
||
override def endComplex(complex: InfosetComplexElement): Unit = { buf += complex } | ||
|
||
override def startArray(array: InfosetArray): Unit = { buf += array } | ||
|
||
override def endArray(array: InfosetArray): Unit = { buf += array } | ||
} | ||
|
||
@Test def testMetadataWalk_DataWalk_01(): Unit = { | ||
val gatherData = new GatherData | ||
val gatherMetadata = new GatherMetadata | ||
val sch = SchemaUtils.dfdlTestSchema( | ||
<xs:include schemaLocation="/org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd"/>, | ||
<dfdl:format ref="ex:GeneralFormat" lengthKind="implicit"/>, | ||
<xs:element name="e1" dfdl:terminator="."> | ||
<xs:complexType> | ||
<xs:sequence dfdl:separator=";" dfdl:terminator=";"> | ||
<xs:element name="s1" type="xs:int" dfdl:lengthKind="delimited" maxOccurs="4" minOccurs="0" dfdl:occursCountKind="implicit"/> | ||
</xs:sequence> | ||
</xs:complexType> | ||
</xs:element>, | ||
useTNS = false, | ||
) | ||
val dp = compileAndWalkMetadata(sch, gatherMetadata) | ||
val md = gatherMetadata.getResult | ||
val mdQNames = md.map { | ||
case e: ElementMetadata => e.toQName | ||
case seq: SequenceMetadata => "seq" | ||
case cho: ChoiceMetadata => "cho" | ||
} | ||
assertEquals("Vector(e1, seq, s1, seq, e1)", mdQNames.toString) | ||
val parser: Array[Byte] => ParseResult = parseAndWalkData(dp, gatherData) | ||
val inputData = "5;6;7;8;.".getBytes("utf-8") | ||
val res = parser(inputData) | ||
val infosetItems = gatherData.getResult | ||
val itemQNames = infosetItems.flatMap { | ||
case e: InfosetElement => Seq(e.metadata.toQName) | ||
case e: InfosetArray => Seq(e.metadata.name + "_array") | ||
case _ => Nil | ||
} | ||
assertEquals("Vector(e1, s1_array, s1, s1, s1, s1, s1_array, e1)", itemQNames.toString) | ||
val itemValues = infosetItems.flatMap { | ||
case e: InfosetSimpleElement => Seq(e.getText) | ||
case _ => Nil | ||
} | ||
assertEquals("5678", itemValues.mkString) | ||
} | ||
|
||
/** | ||
* Shows that there are no hidden elements to deal with in | ||
* the metadata walk nor the data walk. | ||
*/ | ||
@Test def testMetadataWalk_DataWalk_NoHidden(): Unit = { | ||
val gatherData = new GatherData | ||
val gatherMetadata = new GatherMetadata | ||
val sch = SchemaUtils.dfdlTestSchema( | ||
<xs:include schemaLocation="/org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd"/>, | ||
<dfdl:format ref="ex:GeneralFormat" lengthKind="delimited"/>, | ||
Seq( | ||
<xs:group name="len"> | ||
<xs:sequence> | ||
<xs:element name="len" type="xs:unsignedInt" | ||
dfdl:outputValueCalc='{ dfdl:valueLength(../s1[1], "bytes") }'/> | ||
</xs:sequence> | ||
</xs:group>, | ||
<xs:element name="e1" dfdl:terminator="."> | ||
<xs:complexType> | ||
<xs:choice dfdl:choiceDispatchKey='{ "ints" }'> | ||
<xs:sequence dfdl:choiceBranchKey="strings"/> | ||
<xs:sequence dfdl:separator=";" dfdl:choiceBranchKey="ints"> | ||
<xs:sequence dfdl:hiddenGroupRef="ex:len"/> | ||
<xs:element name="s1" type="xs:int" | ||
dfdl:lengthKind="explicit" dfdl:length="{ ../len }" | ||
maxOccurs="4" minOccurs="0" dfdl:occursCountKind="implicit"/> | ||
</xs:sequence> | ||
</xs:choice> | ||
</xs:complexType> | ||
</xs:element>, | ||
), | ||
useTNS = false, | ||
useDefaultNamespace = false, | ||
elementFormDefault = "unqualified", | ||
) | ||
val dp = compileAndWalkMetadata(sch, gatherMetadata) | ||
val md = gatherMetadata.getResult | ||
val mdQNames = md.map { | ||
case e: ElementMetadata => e.toQName | ||
case seq: SequenceMetadata => "seq" | ||
case cho: ChoiceMetadata => "cho" | ||
} | ||
assertEquals( | ||
"Vector(ex:e1, cho, seq, seq, seq, s1, seq, cho, ex:e1)", | ||
mdQNames.toString, | ||
) | ||
val parser: Array[Byte] => ParseResult = parseAndWalkData(dp, gatherData) | ||
val inputData = "1;5;6;7;8.".getBytes("utf-8") | ||
val res = parser(inputData) | ||
val infosetItems = gatherData.getResult | ||
val itemQNames = infosetItems.flatMap { | ||
case e: InfosetElement => Seq(e.metadata.toQName) | ||
case e: InfosetArray => Seq(e.metadata.name + "_array") | ||
case _ => Nil | ||
} | ||
assertEquals( | ||
"Vector(ex:e1, s1_array, s1, s1, s1, s1, s1_array, ex:e1)", | ||
itemQNames.toString, | ||
) | ||
val itemValues = infosetItems.flatMap { | ||
case e: InfosetSimpleElement => Seq(e.getText) | ||
case _ => Nil | ||
} | ||
assertEquals("5678", itemValues.mkString) | ||
} | ||
|
||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.