CoNLL 2025 Shared Task: Robust WSI

This is a repository containing data for the Robust Word Sense Induction shared task.

For details, please visit the the website.

Sample Files

The sample/ directory contains the annotated test sets for three words for each language.

The files are encoded as UTF-8 and use columnar format separated by TAB characters. No quoting is used and the first line describes the names of the columns. All the files have the same structure.

Column headword represents the headword.
Columns starting with sense represent the "gold" annotations, one column per annotator. Value ending with an x means that the annotator has not marked this line in any way.
Column text contains the the sentence, within which the specific occurrence appears.

Test Files

The test directory contains the files to be clustered by your word sense induction system. The format differs from the sample files by omitting the annotation columns, which are used for the evaluation.

The Scorer Program

To obtain a good performance, is written in Rust, the source code is in the scorer/ directory, a prebuilt static binary for x86_64 Linux is present in the scorer/bin/ directory.

Usage

Annotate the test set using your own WSI system and create a TSV file containing a column with the cluster labels. A header needs to be present. The default name for the cluster column is cluster. Other columns might be present as well. You can also place the column with the cluster labels into the file containing the gold data.

Then run the scorer and observe the output:

./bin/scorer GOLD_FILE -f CLUSTER_FILE

To change the name of the cluster column, use the -c option. If your labels are in the same file as the gold data is, omit the -f option.

Compilation

To build the program yourself, install Rust using https://rustup.rs/ and then run cargo build --release from the scorer/ directory.

Licensing

Shield:

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Contact

Do not hesitate to contact us at [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
sample		sample
scorer		scorer
test		test
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CoNLL 2025 Shared Task: Robust WSI

Sample Files

Test Files

The Scorer Program

Usage

Compilation

Licensing

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

lexicalcomputing/conll_shared_task_robust_wsi

Folders and files

Latest commit

History

Repository files navigation

CoNLL 2025 Shared Task: Robust WSI

Sample Files

Test Files

The Scorer Program

Usage

Compilation

Licensing

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages