entity-linkings

entity-linkings is a unified library for entity linking.

Instllation

# from PyPi
pip install entity-linkings

# from the source
git clone [email protected]:naist-nlp/entity-linkings.git
cd entity-linkings
pip install .

# for uv users
git clone [email protected]:naist-nlp/entity-linkings.git
cd entity-linkings
uv sync

Quick Start

entity-linkigs provides two interfaces: command-line interface (CLI) and Python API.

CLI

Command-line interface can train/evalate/run Entity Linkings system from command-line. To create EL system, you must build candidate retriever with entitylinkings-train_retrieval. In this example, e5bm25 can be executed with custom dataset.

entitylinkings-train-retrieval \
    --retriever_id  e5bm25 \
    --train_file train.jsonl \
    --validation_file validation.jsonl \
    --dictionary_id_or_path dictionary.jsonl \
    --output_dir save_model/ \
    --num_hard_negatives 4 \
    --num_train_epochs 10 \
    --train_batch_size 8 \
    --validation_batch_size 16 \
    --config config.yaml \
    --wandb

Next, Entity Disambiguation (ED) and End-to-End Entity Linking (EL) systems can trained with entitylinkings-train. This example is the FEVRY with custom candidate retriever.

entitylinkings-train \
    --model_type ed \
    --model_id fevry \
    --model_name_or_path google-bert/bert-base-uncased \
    --retriever_id e5bm25 \
    --retriever_model_name_or_path save_model/ \
    --dictionary_id_or_path dictionary.jsonl \
    --train_file train.jsonl \
    --validation_file validation.jsonl \
    --num_candidates 30 \
    --num_train_epochs 2 \
    --train_batch_size 8 \
    --validation_batch_size 16 \
    --output_dir save_fevry/ \
    --config config.yaml \
    --wandb

Finally, you can evaluate Retriever or EL systems with entitylinkings-eval or entitylinkings-eval-retrieval, respectively.

entitylinkings-eval-retrieval \
    --retriever_id <retriever_id> \
    --model_name_or_path save_model/ \
    --dictionary_id_or_path dictionary.jsonl \
    --test_file test.jsonl \
    --config config.yaml \
    --output_dir result/ \
    --test_batch_size 256 \
    --wandb

entitylinkings-eval \
    --model_type ed \
    --model_id fevry \
    --model_name_or_path save_fevry/ \
    --retriever_id e5bm25 \
    --retriever_model_name_or_path save_model/ \
    --dictionary_id_or_path dictionary.jsonl \
    --test_file test.jsonl \
    --config config.yaml \
    --output_dir result/ \
    --test_batch_size 256 \
    --wandb

You can change the arguments (e.g., context length) using configuration file. The config.yaml with default values can be generated via entitylinkings-gen-config.

entitylinkings-gen-config

Python API

This is the exemple of ChatEL with Zelda Candidate list via API. Valids IDs for get_retrievers and get_models() can be found with get_retriever_ids and get_model_ids() respectively.

from entity_linkings import get_retrievers, get_models, load_dictionary

# Load Dictionary from dictionary_id or local path
dictionary = load_dictionary('zelda')

# Load Candidate Retriever
retriever_cls = get_retrievers('zeldacl')
retriever = retriever_cls(
    dictionary,
    config=retriever_cls.Config()
)

# Setup ED or EL models
model_cls = get_models('chatel')
model = model_cls(
    task='ed'
    retriever=retriever,
    config=model_cls.Config("gpt-4o")
)

# Prediction
sentences = "NAIST is in Ikoma."
spans = [(0, 5)]
predictions = model.predict(sentence, spans, top_k=1)

print("ID: ", predictions[0][0]["id"])
print("Title: ", predictions[0][0]["prediction"])
print("Score: ",  predictions[0][0]["score"])

Available Models

Please refer to the link for instructions on how to run each model.

Candidate Retriever

BM25
ZELDA Candidate List (Milich and Akbik., 2023)
Dual Encoder Model
Text Embedding Model
E5+BM25 (Nakatani et al., 2025)

Candidate Reranker

Entity Dictionary

Available Dictionaries

dictionary_id	Dataset	Language	Domain
`kilt`	KILT (Petroni et al., 2021)	English	Wikipedia
`zelda`	ZELDA (Milich and Akbik., 2023)	English	Wikipedia
`zeshel`	ZeshEL (Logeswaran et al., 2021)	English	Wikia

Custom Entity Dictionary

If you want to use our packages with your custom ontologies, you need to convert to the following format:

{
  "id": "000011",
  "name": "NAIST",
  "description": "NAIST is located in Ikoma."
}

Datasets

Public datasets

dataset_id	Dataset	Domain	Language	Ontology	Train	Licence
`kilt`	KILT (Petroni et al., 2021)	Wikipedia	English	Wikipedia	✅	Unknown*
`zelda`	ZELDA (Milich and Akbik., 2023)	Wikimedia	English	Wikipedia	✅	Unknown*
`msnbc`	MSNBC (Cucerzan, 2007)	News	English	Wikipedia		Unknown*
`aquaint`	AQUAINT (Milne and Witten, 2008)	News	English	Wikipedia		Unknown*
`ace2004`	ACE2004 (Ratinov et al, 2011)	News	English	Wikipedia		Unknown*
`kore50`	KORE50 (Hoffart et al., 2012)	News	English	Wikipedia		CC BY-SA 3.0
`n3-r128`	N3-Reuters-128 (R̈oder et al., 2014)	News	English	Wikipedia		GNU AGPL-3.0
`n3-r500`	N3-RSS-500 (R̈oder et al., 2014)	RSS	English	Wikipedia		GNU AGPL-3.0
`derczynski`	Derczynski (Derczynski et al., 2015)	Twitter	English	Wikipedia		CC-BY 4.0
`oke-2015`	OKE-2015 (Nuzzolese et al., 2015)	News	English	Wikipedia		Unknown*
`oke-2016`	OKE-2016 (Nuzzolese et al., 2015)	News	English	Wikipedia		Unknown*
`wned-wiki`	WNED-WIKI (Guo and Barbosa, 2018)	Wikipedia	English	Wikipedia		Unknown
`wned-cweb`	WNED-CWEB (Guo and Barbosa, 2018)	Web	English	Wikipedia		Apache License 2.0
`unseen`	WikilinksNED Unseen-Mentions (Onoe and Durrett, 2020)	News	English	Wikipedia	✅	CC-BY 3.0*
`tweeki`	Tweeki EL (Harandizadeh and Singh, 2020)	Twitter	English	Wikipedia		Apache License 2.0
`reddit-comments`	Reddit EL (Botzer et al., 2021)	Reddit	English	Wikipedia		CC-BY 4.0
`reddit-posts`	Reddit EL (Botzer et al., 2021)	Reddit	English	Wikipedia		CC-BY 4.0
`shadowlink-shadow`	ShadowLink (Provatorova et al., 2021)	Wikipedia	English	Wikipedia		Unknown*
`shadowlink-top`	ShadowLink (Provatorova et al., 2021)	Wikipedia	English	Wikipedia		Unknown*
`shadowlink-tail`	ShadowLink (Provatorova et al., 2021)	Wikipedia	English	Wikipedia		Unknown*
`zeshel`	Zeshel (Logeswaran et al., 2021)	Wikia	English	Wikia	✅	CC-BY-SA
`docred`	Linked-DocRED (Genest et al., 2023)	News	English	Wikipedia	✅	CC-BY 4.0

Original MSNBC (Cucerzan, 2007) is not available due to expiration of the official link. You can download the dataset at GERBIL official code.
ShadownLink, OKE-{2015,2016} are uncertain to publicly use, but they are provided at official repositories.
WikilinksNED Unseen-Mentions is created by splitting the WikilinksNED. The WikilinksNED is derived from the Wikilinks corpus, which is made available under CC-BY 3.0.
The folowing datasests is not publicly available or uncertain. If you want to evaluate these resource, please register the LDC and convert these dataset to our format.
- AIDA CoNLL-YAGO (Hoffart et al., 2011): You must sign the agreement to use Reuter Corpus
- TACKBP-2010 (Ji et al., 2011): You must sign Text Analysis Conference (TAC) Knowledge Base Population Evaluation License Agreement.
Results for ZeshEL/ZELDA benchmarks (aida-b, tweeki, reddit-, shadowlink-, and wned-*) across all models can be found in the Spreadsheet.

Custom Dataset

If you want to use our packages with the your private dataset, you must convert it to the following format:

{
  "id": "doc-001-P1",
  "text": "She graduated from NAIST.",
  "entities": [{"start": 19, "end": 24, "label": ["000011"]}],
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github/workflows		.github/workflows
appendix		appendix
assets		assets
entity_linkings		entity_linkings
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

entity-linkings

Instllation

Quick Start

CLI

Python API

Available Models

Candidate Retriever

Candidate Reranker

Entity Dictionary

Available Dictionaries

Custom Entity Dictionary

Datasets

Public datasets

Custom Dataset

About

Uh oh!

Releases

Packages

Languages

License

naist-nlp/entity-linkings

Folders and files

Latest commit

History

Repository files navigation

entity-linkings

Instllation

Quick Start

CLI

Python API

Available Models

Candidate Retriever

Candidate Reranker

Entity Dictionary

Available Dictionaries

Custom Entity Dictionary

Datasets

Public datasets

Custom Dataset

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages