Skip to content

Improve parse_paragraphs() #54

@nils-herrmann

Description

@nils-herrmann

Parsing the jats could be improved. Here are some observations:

  1. Some documents (e.g. doi=10.1007/s40708-014-0001-z) use only <p> without attribute id.
  2. Some documents put paragraphs inside lists:
<list list-type="order">
   <list-item>
       <p id="Par6">
         The normal traffic data is more than abnormal traffic for the training sample, and the data imbalance problem will lead to low precision.
      </p>
   </list-item>
<list-item>

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions