Skip to content

cl-tohoku/zipfian-whitening

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Zipfian Whitening

This repository contains the code for the NeurIPS 2024 paper Zipfian Whitening by Sho Yokoi, Han Bao, Hiroto Kurita and Hidetoshi Shimodaira.

Overview

This repository mainly consists of two parts: experiments for static word embeddings and transformer-based embeddings.

  • Experiments for static word embeddings: STS evaluation based on MTEB benchmark.
  • Experiments for transformer-based embeddings: STS evaluation based on SimCSE's SentEval implementation. All the implementations are under the SimCSE directory, so please refer to the SimCSE directory for the details.
.
├── data # dataset files for static word embeddings experiments
├── notebooks # notebooks for visualization for static word embeddings experiments
├── results # results for static word embeddings experiments
├── scripts # scripts for running experiments for static word embeddings
├── src # source code for static word embeddings experiments
├── SimCSE # source code for transformer-based embeddings experiments
    ├── ...
├── ...

Experiments for static word embeddings

1. Install dependencies

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

2. Download pre-trained embeddings / convert to torch format

Download word2vec binary

3. Reproduce the experimental results

source .venv/bin/activate
bash scripts/run.sh

4. Visualize the results

  • Please refer to the notebooks in the notebooks directory for the visualization of the results.

Note

[1] Sanjeev Arora, Yingyu Liang and Tengyu Ma. "A Simple but Tough-to-Beat Baseline for Sentence Embeddings" In ICLR, 2017.
[2] William Rudman, Nate Gillman, Taylor Rayne, and Carsten Eickhoff. "IsoScore: Measuring the Uniformity of Embedding Space Utilization" In Findings of ACL, 2022.

Experiments for transformer-based embeddings

1. Install dependencies

Please go to the SimCSE directory and run the following command:

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

2. Download STS datasets

Please go to the SimCSE directory and run the following command:

cd SentEval/data/downstream/
bash download_dataset.sh

3. Reproduce the experimental results

Please go to the SimCSE directory and run the following command:

source .venv/bin/activate
bash scripts/run.sh

Citation

If you use this codebase or find this work helpful, please cite:

@inproceedings{
yokoi2024zipfian,
    title={Zipfian Whitening},
    author={Sho Yokoi and Han Bao and Hiroto Kurita and Hidetoshi Shimodaira},
    booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
    year={2024},
    url={https://openreview.net/forum?id=pASJxzMJb7}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published