Skip to content

Add EAF controlled vocabulary to metadata #344

@William-N-Havard

Description

@William-N-Havard

Is your feature request related to a problem? Please describe.
EAF tiers can be assigned a specific controlled vocabulary, which is defined by the creator of the EAF file, that the annotators will use during the annotation campaign. This ensures that the annotators do not add custom labels (either intentionally or by mistake).

First, when importing annotations belonging to a new type of tier (see issue #343) it would be good to ensure that all the annotations use labels defined in the controlled vocabulary (it's better to be safe than sorry!)

Second, it would be nice to also import the description of each label of the controlled vocabulary and store it somewhere. This description is stored directly in the EAF file. Storing this description would allow users of the data set to understand the meaning of the codes used during the annotation campaign.

<CONTROLLED_VOCABULARY CV_ID="vcm">
        <DESCRIPTION LANG_REF="und">Simplified subset of infant vocal maturity classes (distinguishing between variegated and non-variegated syllables)</DESCRIPTION>
        <CV_ENTRY_ML CVE_ID="cveid_e7300257-f12a-479f-90f0-c2fefbf99a26">
            <CVE_VALUE DESCRIPTION="Crying" LANG_REF="und">Y</CVE_VALUE>
        </CV_ENTRY_ML>
        <CV_ENTRY_ML CVE_ID="cveid_ae00bfde-d4bb-499e-8c63-81c4459f5b8a">
            <CVE_VALUE DESCRIPTION="Laughing" LANG_REF="und">L</CVE_VALUE>
        </CV_ENTRY_ML>
        <CV_ENTRY_ML CVE_ID="cveid_df01bf24-04f4-4cff-9bc4-ca92a0ca945f">
            <CVE_VALUE
                DESCRIPTION="Non-canonical non-variegated syllable(s)" LANG_REF="und">A</CVE_VALUE>
        </CV_ENTRY_ML>
        <CV_ENTRY_ML CVE_ID="cveid_8675a2cf-bb35-476c-a602-8b911eb2a845">
            <CVE_VALUE
                DESCRIPTION="Non-canonical variegated syllable(s)" LANG_REF="und">P</CVE_VALUE>
        </CV_ENTRY_ML>
        <CV_ENTRY_ML CVE_ID="cveid_f1ad7cdd-4916-4914-a59a-a33d0d7052cc">
            <CVE_VALUE DESCRIPTION="Canonical variegated syllable(s)" LANG_REF="und">V</CVE_VALUE>
        </CV_ENTRY_ML>
        <CV_ENTRY_ML CVE_ID="cveid_09a9bb98-31a9-4afd-9ed7-d4fc7af658a6">
            <CVE_VALUE
                DESCRIPTION="Canonical non-variegated syllable(s)" LANG_REF="und">W</CVE_VALUE>
        </CV_ENTRY_ML>
        <CV_ENTRY_ML CVE_ID="cveid_ee07af47-c822-4fb3-80d3-d842d80272b7">
            <CVE_VALUE DESCRIPTION="Uncertain" LANG_REF="und">U</CVE_VALUE>
        </CV_ENTRY_ML>
    </CONTROLLED_VOCABULARY>

Describe the solution you'd like
Check controlled vocabulary when importing EAF file and add the description of the controlled vocabulary labels to the metadata.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions