Skip to content
/ GTS Public

Code for Unsupervised multi-granular Chinese word segmentation and term discovery via graph partition [JBI]

Notifications You must be signed in to change notification settings

GanjinZero/GTS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

01bf837 · Jan 28, 2022

History

11 Commits
Jan 28, 2022
Jan 29, 2020
Jan 29, 2020
Dec 24, 2019
Jan 28, 2022
Dec 24, 2019

Repository files navigation

Introduction

Code for Unsupervised multi-granular Chinese word segmentation and term discovery via graph partition.

Usage

N-gram and trained BERT classifier cannot be public since privacy policy.

Use in command lines

python -m graces -s 饮食可睡眠可大便不规律小便正常体重无明显减轻python -m graces -f ./input.txt -o ./output.txt

Import from python

import graces
graces.cut("饮食可,睡眠可,大便不规律,小便正常,体重无明显减轻。") # Segment a single sentence
graces.cut_k("饮食可,睡眠可,大便不规律,小便正常,体重无明显减轻。", k=8) # Segment a single sentence with fixed word count k.
graces.cut_file("./input.txt", "./output.txt") # Segment a file

Data

We ask MD students to construct coarse and fine level word segmentation on EHRs for validation. We do not use data for training!

  • dev.txt: Unlabeled EHRs from part of CCKS2019.
  • dev_label_coarse.txt: Coarse-level word segmentation labels.
  • dev_label_fine.txt: Fine-level word segmentation labels.

Citation

If you find our codes or data useful, please cite:

@article{YUAN2020103542,
title = "Unsupervised multi-granular Chinese word segmentation and term discovery via graph partition",
journal = "Journal of Biomedical Informatics",
volume = "110",
pages = "103542",
year = "2020",
issn = "1532-0464",
doi = "https://doi.org/10.1016/j.jbi.2020.103542",
url = "http://www.sciencedirect.com/science/article/pii/S1532046420301702",
author = "Zheng Yuan and Yuanhao Liu and Qiuyang Yin and Boyao Li and Xiaobin Feng and Guoming Zhang and Sheng Yu",
}

About

Code for Unsupervised multi-granular Chinese word segmentation and term discovery via graph partition [JBI]

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages