anaGo

anaGo is a Keras implementation of sequence labeling.

anaGo can perform Named Entity Recognition (NER), Part-of-Speech tagging (POS tagging), semantic role labeling (SRL) and so on for many languages. For example, the following picture shows Named Entity Recognition in English:

The following picture shows Named Entity Recognition in Japanese:

Similarly, you can solve your task (NER, POS,...) for your language. You don't have to define features. You have only to prepare input and output data. :)

anaGo Support Features

anaGo supports following features:

training the model without any features.
defining the custom model.
downloading pre-trained models.

Install

To install anaGo, simply run:

$ pip install anago

or install from the repository:

$ git clone https://github.com/Hironsan/anago.git
$ cd anago
$ pip install -r requirements.txt

Data and Word Vectors

Training data takes a tsv format. The following text is an example of training data:

EU	B-ORG
rejects	O
German	B-MISC
call	O
to	O
boycott	O
British	B-MISC
lamb	O
.	O

Peter	B-PER
Blackburn	I-PER

anaGo supports pre-trained word embeddings like GloVe vectors.

Get Started

Import

First, import the necessary modules:

import anago
from anago.reader import load_data_and_labels

Loading data

After importing the modules, load training, validation and test dataset:

x_train, y_train = load_data_and_labels('train.txt')
x_valid, y_valid = load_data_and_labels('valid.txt')
x_test, y_test = load_data_and_labels('test.txt')

Now we are ready for training :)

Training a model

Let's train a model. To train a model, call train method:

model = anago.Sequence()
model.train(x_train, y_train, x_valid, y_valid)

If training is progressing normally, progress bar would be displayed:

...
Epoch 3/15
702/703 [============================>.] - ETA: 0s - loss: 60.0129 - f1: 89.70
703/703 [==============================] - 319s - loss: 59.9278   
Epoch 4/15
702/703 [============================>.] - ETA: 0s - loss: 59.9268 - f1: 90.03
703/703 [==============================] - 324s - loss: 59.8417   
Epoch 5/15
702/703 [============================>.] - ETA: 0s - loss: 58.9831 - f1: 90.67
703/703 [==============================] - 297s - loss: 58.8993   
...

Evaluating a model

To evaluate the trained model, call eval method:

model.eval(x_test, y_test)

After evaluation, F1 value is output:

- f1: 90.67

Tagging a sentence

Let's try tagging a sentence, "President Obama is speaking at the White House." To tag a sentence, call analyze method:

>>> words = 'President Obama is speaking at the White House.'.split()
>>> model.analyze(words)
{
    "words": [
        "President",
        "Obama",
        "is",
        "speaking",
        "at",
        "the",
        "White",
        "House."
    ],
    "entities": [
        {
            "beginOffset": 1,
            "endOffset": 2,
            "score": 1,
            "text": "Obama",
            "type": "PER"
        },
        {
            "beginOffset": 6,
            "endOffset": 8,
            "score": 1,
            "text": "White House.",
            "type": "ORG"
        }
    ]
}

Downloading pre-trained models

To download a pre-trained model, call download function:

from anago.utils import download

dir_path = 'models'
url = 'https://storage.googleapis.com/chakki/datasets/public/models.zip'
download(url, dir_path)
model = anago.Sequence.load(dir_path)

Reference

This library uses bidirectional LSTM + CRF model based on Neural Architectures for Named Entity Recognition by Lample, Guillaume, et al., NAACL 2016.

Name	Name	Last commit message	Last commit date
Latest commit Hironsan Merge pull request Hironsan#33 from EmilStenstrom/markdown-readme Feb 21, 2018 97a579e · Feb 21, 2018 History 168 Commits
anago	anago	Fix AttributeError: 'ChainCRF' object has no attribute 'inbound_nodes'	Jan 30, 2018
data	data	change load_data_and_labels to load tsv file	Aug 28, 2017
docs/images	docs/images	update README	Sep 6, 2017
examples	examples	Add example code	Nov 25, 2017
tests	tests	Update README	Nov 24, 2017
.gitignore	.gitignore	update .gitignore	Jun 29, 2017
LICENSE	LICENSE	add LICENSE	Aug 22, 2017
README.md	README.md	Update README	Nov 25, 2017
requirements.txt	requirements.txt	Merge pull request Hironsan#33 from EmilStenstrom/markdown-readme	Feb 21, 2018
setup.py	setup.py	Use markdown instead of reStructuredText for readme.	Feb 9, 2018
tox.ini	tox.ini	add tox.ini	Aug 24, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

anaGo

anaGo Support Features

Install

Data and Word Vectors

Get Started

Import

Loading data

Training a model

Evaluating a model

Tagging a sentence

Downloading pre-trained models

Reference

About

Releases

Packages

Languages

License

mstrewe/anago

Folders and files

Latest commit

History

Repository files navigation

anaGo

anaGo Support Features

Install

Data and Word Vectors

Get Started

Import

Loading data

Training a model

Evaluating a model

Tagging a sentence

Downloading pre-trained models

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages