Skip to content

Commit d33c476

Browse files
molereddyAtry
andauthored
feat: add CE-U loss (#127)
* feat: add CE-U loss * chore: move CE-U related functions to `ceu.py` * Ruff format * remove runner * Add CE-U link * Add CE-U runner * Link OU model collection * Update verbiage --------- Co-authored-by: Bo Yang <bo@tacnode.io>
1 parent b80a5fc commit d33c476

6 files changed

Lines changed: 179 additions & 6 deletions

File tree

README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919

2020
## 📖 Overview
2121

22-
We provide efficient and streamlined implementations of the TOFU, MUSE and WMDP unlearning benchmarks while supporting 8+ unlearning methods, 5+ datasets, 10+ evaluation metrics, and 7+ LLM architectures. Each of these can be easily extended to incorporate more variants.
22+
We provide efficient and streamlined implementations of the TOFU, MUSE and WMDP unlearning benchmarks while supporting 11+ unlearning methods, 5+ datasets, 10+ evaluation metrics, and 7+ LLM architectures. Each of these can be easily extended to incorporate more variants.
2323

2424
We invite the LLM unlearning community to collaborate by adding new benchmarks, unlearning methods, datasets and evaluation metrics here to expand OpenUnlearning's features, gain feedback from wider usage and drive progress in the field.
2525

@@ -33,7 +33,7 @@ We invite the LLM unlearning community to collaborate by adding new benchmarks,
3333

3434
🌟 **Highlights:**
3535
- A detailed technical report on OpenUnlearning covering the design, features, and implementation.
36-
- A meta-evaluation framework for benchmarking unlearning evaluations across 450+ open-source models.
36+
- A meta-evaluation framework for benchmarking unlearning evaluations across 450+ models, open-sourced on HuggingFace 🤗: [TOFU Models w & w/o Knowledge](https://huggingface.co/collections/open-unlearning/tofu-models-w-and-w-o-knowledge-6861e4d935eb99ba162e55cd), [TOFU Unlearned Models](https://huggingface.co/collections/open-unlearning/tofu-unlearned-models-6860f6cf3fe35d0223d92e88).
3737
- Results benchmarking 8 diverse unlearning methods in one place using 10 evaluation metrics on TOFU.
3838

3939
<details>
@@ -77,10 +77,10 @@ We provide several variants for each of the components in the unlearning pipelin
7777
| **Component** | **Available Options** |
7878
|------------------------|----------------------|
7979
| **Benchmarks** | [TOFU](https://arxiv.org/abs/2401.06121), [MUSE](https://muse-bench.github.io/), [WMDP](https://www.wmdp.ai/) |
80-
| **Unlearning Methods** | GradAscent, GradDiff, NPO, SimNPO, DPO, RMU, UNDIAL, AltPO, SatImp, WGA |
80+
| **Unlearning Methods** | GradAscent, GradDiff, NPO, SimNPO, DPO, RMU, UNDIAL, AltPO, SatImp, WGA, CE-U |
8181
| **Evaluation Metrics** | Verbatim Probability, Verbatim ROUGE, Knowledge QA-ROUGE, Model Utility, Forget Quality, TruthRatio, Extraction Strength, Exact Memorization, 6 MIA attacks, [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) |
8282
| **Datasets** | MUSE-News (BBC), MUSE-Books (Harry Potter), TOFU (different splits), WMDP-Bio, WMDP-Cyber |
83-
| **Model Families** | TOFU: LLaMA-3.2, LLaMA-3.1, LLaMA-2; MUSE: LLaMA-2; Additional: Phi-3.5, Phi-1.5, Gemma, Zephyr |
83+
| **Model Families** | TOFU: Llama-3.2, Llama-3.1, Llama-2; MUSE: Llama-2; Additional: Phi-3.5, Phi-1.5, Gemma, Zephyr |
8484

8585
---
8686

@@ -124,7 +124,7 @@ python setup_data.py --eval # saves/eval now contains evaluation results of the
124124

125125
### 🔄 Updated TOFU benchmark
126126

127-
We've updated Open-Unlearning's TOFU benchmark target models to use a wider variety of newer architectures with sizes varying from 1B to 8B. These include LLaMA 3.2 1B, LLaMA 3.2 3B, LLaMA 3.1 8B, and the original LLaMA-2 7B (re-created) target models from [the old version of TOFU](github.com/locuslab/tofu).
127+
We've updated Open-Unlearning's TOFU benchmark target models to use a wider variety of newer architectures with sizes varying from 1B to 8B. These include Llama 3.2 1B, Llama 3.2 3B, Llama 3.1 8B, and the original Llama-2 7B (re-created) target models from [the old version of TOFU](github.com/locuslab/tofu).
128128

129129
For each architecture, we have finetuned with four different splits of the TOFU datasets: `full`, `retain90`, `retain95`, `retain99`, for a total of 16 finetuned models. The first serves as the target (base model for unlearning) and the rest are retain models used to measure performance against for each forget split. These models are on [HuggingFace](`https://huggingface.co/collections/open-unlearning/tofu-new-models-67bcf636334ea81727573a9f0`) and the paths to these models can be set in the experimental configs or in command-line overrides.
130130

community/methods/CEU/run.sh

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
#!/bin/bash
2+
3+
export MASTER_PORT=$(python -c "import socket; s=socket.socket(); s.bind(('', 0)); print(s.getsockname()[1]); s.close()")
4+
echo "Master Port: $MASTER_PORT"
5+
6+
########################################################################################################################
7+
########################################### Unlearn TOFU models with CE-U #############################################
8+
########################################################################################################################
9+
10+
models=(
11+
"Llama-3.2-1B-Instruct"
12+
)
13+
trainers_experiments=(
14+
"CEU unlearn/tofu/default.yaml"
15+
)
16+
forget_retain_splits=(
17+
"forget10 retain90"
18+
"forget05 retain95"
19+
"forget01 retain99"
20+
)
21+
22+
per_device_train_batch_size=16
23+
gradient_accumulation_steps=2
24+
25+
lrs=(1e-5)
26+
27+
for split in "${forget_retain_splits[@]}"; do
28+
forget_split=$(echo $split | cut -d' ' -f1)
29+
retain_split=$(echo $split | cut -d' ' -f2)
30+
for model in "${models[@]}"; do
31+
for trainer_experiment in "${trainers_experiments[@]}"; do
32+
trainer=$(echo $trainer_experiment | cut -d' ' -f1)
33+
experiment=$(echo $trainer_experiment | cut -d' ' -f2)
34+
for lr in "${lrs[@]}"; do
35+
task_name=tofu_${model}_${forget_split}_${trainer}_lr${lr}
36+
model_path=open-unlearning/tofu_${model}_full
37+
echo ${task_name}: Unlearning ${model_path} using ${trainer}
38+
39+
# Unlearn
40+
CUDA_VISIBLE_DEVICES=0 \
41+
python src/train.py --config-name=unlearn.yaml \
42+
experiment=${experiment} \
43+
trainer=${trainer} \
44+
task_name=${task_name} \
45+
model=${model} \
46+
forget_split=${forget_split} \
47+
retain_split=${retain_split} \
48+
model.model_args.pretrained_model_name_or_path=${model_path} \
49+
retain_logs_path=saves/eval/tofu_${model}_${retain_split}/TOFU_EVAL.json \
50+
trainer.args.per_device_train_batch_size=$per_device_train_batch_size \
51+
trainer.args.gradient_accumulation_steps=$gradient_accumulation_steps \
52+
trainer.args.eval_strategy=no \
53+
trainer.args.eval_on_start=False \
54+
trainer.args.learning_rate=$lr
55+
56+
# Eval
57+
CUDA_VISIBLE_DEVICES=0 python src/eval.py \
58+
experiment=eval/tofu/default.yaml \
59+
forget_split=${forget_split} \
60+
model=${model} \
61+
task_name=${task_name} \
62+
model.model_args.pretrained_model_name_or_path=saves/unlearn/${task_name} \
63+
paths.output_dir=saves/unlearn/${task_name}/evals \
64+
retain_logs_path=saves/eval/tofu_${model}_${retain_split}/TOFU_EVAL.json
65+
done
66+
done
67+
done
68+
done

configs/trainer/CEU.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
defaults:
2+
- finetune
3+
4+
handler: CEU
5+
method_args:
6+
ignore_first_n_answer_tokens: 1

docs/links.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ Links to research papers and resources corresponding to implemented features in
3636
| AltPO | Paper[📄](https://arxiv.org/pdf/2409.13474), Code [🐙](https://github.com/molereddy/Alternate-Preference-Optimization) |
3737
| SatImp | Paper[📄](https://arxiv.org/pdf/2505.11953), Code [🐙](https://github.com/Puning97/SatImp-for-LLM-Unlearning) |
3838
| WGA (G-effect) | Paper[📄](https://arxiv.org/pdf/2502.19301), Code [🐙](https://github.com/tmlr-group/G-effect) |
39+
| CE-U (Cross-Entropy unlearning) | Paper[📄](https://arxiv.org/pdf/2503.01224) |
3940

4041
---
4142

@@ -59,7 +60,7 @@ Links to research papers and resources corresponding to implemented features in
5960
| Forget Quality, Truth Ratio, Model Utility | TOFU ([📄](https://arxiv.org/abs/2401.06121)) |
6061
| Extraction Strength (ES) | Carlini et al., 2021 ([📄](https://www.usenix.org/conference/usenixsecurity21/presentation/carlini-extracting)), used for unlearning in Wang et al., 2025 ([📄](https://openreview.net/pdf?id=wUtCieKuQU)) |
6162
| Exact Memorization (EM) | Tirumala et al., 2022 ([📄](https://proceedings.neurips.cc/paper_files/paper/2022/hash/fa0509f4dab6807e2cb465715bf2d249-Abstract-Conference.html)), used for unlearning in Wang et al., 2025 ([📄](https://openreview.net/pdf?id=wUtCieKuQU)) |
62-
| lm-evaluation-harness | [💻](https://github.com/EleutherAI/lm-evaluation-harness/tree/main) |
63+
| lm-evaluation-harness | Repository: [💻](https://github.com/EleutherAI/lm-evaluation-harness/tree/main) |
6364

6465
---
6566

src/trainer/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
from trainer.unlearn.simnpo import SimNPO
1212
from trainer.unlearn.rmu import RMU
1313
from trainer.unlearn.undial import UNDIAL
14+
from trainer.unlearn.ceu import CEU
1415
from trainer.unlearn.satimp import SatImp
1516
from trainer.unlearn.wga import WGA
1617

@@ -93,5 +94,6 @@ def load_trainer(
9394
_register_trainer(SimNPO)
9495
_register_trainer(RMU)
9596
_register_trainer(UNDIAL)
97+
_register_trainer(CEU)
9698
_register_trainer(SatImp)
9799
_register_trainer(WGA)

src/trainer/unlearn/ceu.py

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
from trainer.unlearn.base import UnlearnTrainer
2+
3+
import torch
4+
import torch.nn.functional as F
5+
6+
7+
def cross_entropy_unlearning_loss(
8+
logits: torch.Tensor,
9+
labels: torch.Tensor,
10+
ignore_index: int = -100,
11+
) -> torch.Tensor:
12+
"""
13+
Implementation of Cross Entropy Unlearning Loss (CE-U).
14+
15+
This function creates a modified target distribution by setting the logit corresponding to the true label to negative infinity, effectively forcing the model to assign zero probability to the correct answer. The loss then minimizes the KL divergence between this target distribution and the model's output.
16+
17+
Args:
18+
logits: Model output logits with shape [batch_size, sequence_length, vocabulary_size]
19+
labels: Ground truth token indices with shape [batch_size, sequence_length]
20+
ignore_index: Token indices to ignore in the loss calculation (typically padding)
21+
22+
Returns:
23+
A scalar tensor representing the mean unlearning loss across valid positions
24+
"""
25+
batch_size, sequence_length, vocabulary_size = logits.shape
26+
# Extract valid logits and labels based on ignore_index.
27+
if ignore_index is not None:
28+
# Shape: [batch_size, sequence_length], boolean mask
29+
valid_mask = labels != ignore_index
30+
# Shape: [num_valid_positions, vocabulary_size]
31+
valid_logits = logits[valid_mask]
32+
# Shape: [num_valid_positions]
33+
valid_labels = labels[valid_mask]
34+
else:
35+
# Shape: [batch_size*sequence_length, vocabulary_size]
36+
valid_logits = logits.view(-1, vocabulary_size)
37+
# Shape: [batch_size*sequence_length]
38+
valid_labels = labels.view(-1)
39+
40+
# Create a copy of valid_logits to generate the target distribution
41+
# Shape: [num_valid_positions, vocabulary_size]
42+
valid_target_logits = valid_logits.detach().clone()
43+
44+
# Suppress the logits corresponding to the true token by setting them to -inf.
45+
# This ensures that the probability for the true token is effectively zero after softmax.
46+
valid_target_logits.scatter_(
47+
dim=-1,
48+
index=valid_labels.unsqueeze(-1), # Shape: [num_valid_positions, 1]
49+
value=float("-inf"),
50+
) # Result shape: [num_valid_positions, vocabulary_size]
51+
52+
# Apply softmax to generate the target probability distribution
53+
# Shape: [num_valid_positions, vocabulary_size]
54+
valid_target_probabilities = F.softmax(valid_target_logits, dim=-1)
55+
56+
# Compute the cross entropy loss between input logits and target probabilities
57+
# The loss is averaged over the valid positions and returns a scalar tensor
58+
return F.cross_entropy(
59+
input=valid_logits,
60+
target=valid_target_probabilities,
61+
)
62+
63+
64+
def compute_batch_ceu(model, inputs, ignore_first_n_answer_tokens=1):
65+
outputs = model(**inputs)
66+
logits = outputs.logits
67+
labels = inputs["labels"]
68+
69+
# Implement the trick to ignore the first n answer tokens mentioned in the footnote in the Training Settings section of arXiv:2503.01224
70+
valid_mask = labels != -100
71+
update_mask = (
72+
valid_mask.cumsum(dim=-1) <= ignore_first_n_answer_tokens
73+
) & valid_mask
74+
labels_without_first_n_answer_tokens = labels.masked_fill(update_mask, -100)
75+
76+
shifted_labels = labels_without_first_n_answer_tokens[..., 1:].contiguous()
77+
shifted_logits = logits[..., :-1, :].contiguous()
78+
loss = cross_entropy_unlearning_loss(
79+
shifted_logits, shifted_labels, ignore_index=-100
80+
)
81+
return loss, outputs
82+
83+
84+
class CEU(UnlearnTrainer):
85+
def __init__(self, ignore_first_n_answer_tokens=1, *args, **kwargs):
86+
super().__init__(*args, **kwargs)
87+
self.ignore_first_n_answer_tokens = ignore_first_n_answer_tokens
88+
89+
def compute_loss(self, model, inputs, return_outputs=False):
90+
forget_inputs = inputs["forget"]
91+
loss, outputs = compute_batch_ceu(
92+
model,
93+
forget_inputs,
94+
ignore_first_n_answer_tokens=self.ignore_first_n_answer_tokens,
95+
)
96+
return (loss, outputs) if return_outputs else loss

0 commit comments

Comments
 (0)