Skip to content

[NAACL 2025 main] Official code implementation of paper: "Token-based Decision Criteria Are Suboptimal in In-context Learning"

License

Notifications You must be signed in to change notification settings

hc495/Hidden_Calibration

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Token-based Decision Criteria Are Suboptimal in In-context Learning

Hakaze Cho, et al.

Static Badge Static Badge

This repo contains the official code for the following paper published at NAACL 2025 Main conference:

Hakaze Cho, et al. "Token-based Decision Criteria Are Suboptimal in In-context Learning." The 2025 Annual Conference of the Nations of the Americas Chapter of the ACL (NAACL): Main conference, 2025.

Implemented by Hakaze Cho, the primary contributor of the paper.

Some reloaded version of Hidden Calibration can be found in StaICC or ICL_Circuit.

Overview

Abstract

In-Context Learning (ICL) typically utilizes classification criteria from output probabilities of manually selected label tokens. However, we argue that such token-based classification criteria lead to suboptimal decision boundaries, despite delicate calibrations through translation and constrained rotation applied. To address this problem, we propose Hidden Calibration, which renounces token probabilities and uses the nearest centroid classifier on the LM's last hidden states. In detail, we assign the label of the nearest centroid previously estimated from a calibration set to the test sample as the predicted label. Our experiments on 6 models and 10 classification datasets indicate that Hidden Calibration consistently outperforms current token-based baselines by about 20%~50%, achieving a strong state-of-the-art in ICL. Our further analysis demonstrates that Hidden Calibration finds better classification criteria with less inter-class overlap, and LMs provide linearly separable intra-class clusters with the help of demonstrations, which supports Hidden Calibration and gives new insights into the principle of ICL.

Summary figure

In an ICL diagram, A. The prompt of ICL consists of a combination of demonstrations and a query. LMs encode the prompt into the last hidden state $h$, then B. Previous works use the un-embedding vectors of the label tokens ($E^U_+$ and $E^U_-$) to decode the $h$ to prediction $\hat{y}$, then calibrations are used to adjust the predicted logits. C. Our work uses the calibration dataset to calculate centroids ($\overline{h}+$ and $\overline{h}-$) to decode the $h$.

Set Up

0. Requirement

  1. A GPU with more than 22GB VRAM and CUDA (Ver. 12.4 recommended) are strongly required to run all the experiments.
  2. Network connection to huggingface is needed to download the pre-trained model. And a huggingface user token with access to the Llama2 model is recommended to run a part of the experiments.
  3. Anaconda or Miniconda is needed.

1. Clone the repository

git clone https://github.com/hc495/Hidden_Calibration.git

2. Environment Installation

conda env create -f environment.yaml
conda activate hidden_calibration

3. Make Sure Your Working Directory is the Root Directory of the Project

You need to ensure that your working directory is set to the root directory of the project, i.e., the same directory as README.md, even if you open a Jupyter notebook from the Experiments folder.

We provide a default os.chdir() method in every notebook, you should use it to move the working directory to the root directory.

Experiments

We use Jupyter notebooks in the Experiments folder to implement all the experiments descirbed in the main body, we index these notebooks here with the corresponding result figures/tables in the paper, and leave the detailed experiment instructions in each notebook.

  1. Exp1_Main_result.ipynb: The code for the main experiments of the paper: test the performance of Hidden Calibration and other methods on the in-context learning task.

  2. Exp2_Analysis1_Overlap.ipynb: The source code for the analysis in Sec. 5.1 and Sec. 5.2 (a part) of the paper. Mainly to calculate the inter-category overlap and intra-category variance.

  3. Exp3_Analysis2_Inter_Category_Distance.ipynb: The source code for the analysis in Sec. 5.2 (a part) of the paper. Mainly to calculate the inter-category distance.

Citation

If you find this work useful for your research, please cite our paper:

@inproceedings{cho2025token,
  title={Token-based Decision Criteria Are Suboptimal in In-context Learning},
  author={Cho, Hakaze and Sakai, Yoshihiro and Kato, Mariko and Tanaka, Kenshiro and Ishii, Akira and Inoue, Naoya},
  booktitle={Proceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the ACL},
  url={https://arxiv.org/abs/2406.16535},
  year={2025}
}

About

[NAACL 2025 main] Official code implementation of paper: "Token-based Decision Criteria Are Suboptimal in In-context Learning"

Topics

Resources

License

Stars

Watchers

Forks