SeqWORDS package

SeqWORDS is an unsupervised Chinese word segmentation method, which not demand a dictionary in hand. This package is an implementation of SeqWORDS algorithm on python.

Installation

To install this package, execute command below in terminal.

pip install SeqWORDS

Usage

import SeqWORDS
corpus = # YOUR TARGET CORPUS
# Set a SW object
SW = SeqWORDS.WDMtwseq(corpus, 
                        tauL = 10, tauF = 3, 
                        iter_time_total = 10, convergeThld = 0.1, 
                        useProbThld1 = 10e-10, useProbThld2 = 10e-10)
# Run EM algoritm
SW.run()
# Segmentation
SW.cut(connectThld = 0.5)

Parameter

parameter	type	description
`tuaL`	Int	assume the longest word contain tuaL characters
`tuaF`	Int	remove words from initial dict that relative occurence lower than tauF
`iter_time_total`	Int	max EM iteration time
`convergeThld`	Int	EM convergence threshold
`useProbThld1`	Int	minimum word use probability
`useProbThld2`	Int	minimum two words sequence use probability
`connectThld`	Int	if alpha bigger than connectThld then combine two words

Example

Story of Stone

Story of Stone, also called Dream of the Red Chamber, composed by Xueqin Cao in 18th century during the Qing dynasty. The novel features in massive number of characters.

Results

Below is word cloud, it shows the most frequent words. "寶玉" is the biggest one amoung of all cloud.

Below is PCA of word vectors. The plot containing 51 words includes "寶玉" and 50 words that most relative to "寶玉". Amoung these words, there 42 names has great relativity to "寶玉".

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
SeqWORDS.egg-info		SeqWORDS.egg-info
SeqWORDS		SeqWORDS
dist		dist
.gitattributes		.gitattributes
010_word2vec_SeqWORDS.png		010_word2vec_SeqWORDS.png
LICENSE		LICENSE
README.md		README.md
SeqWORDS_cloud.png		SeqWORDS_cloud.png
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SeqWORDS package

Installation

Usage

Parameter

Example

Story of Stone

Results

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

kkuanhui/SeqWORDS

Folders and files

Latest commit

History

Repository files navigation

SeqWORDS package

Installation

Usage

Parameter

Example

Story of Stone

Results

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages