Skip to content

Commit

Permalink
fixing reprdocubility of lmeval tests (#1220)
Browse files Browse the repository at this point in the history
SUMMARY:
LM Eval weekly tests are failing, this resolves two issues
1. installs pillow, which I had locally through vllm but is not
installed as part of llm-compressor
2. adds a random seed to the lmeval tests, which seems after a good
amount of testing to resolve the issue. it is entirely during
calibration/quantization, lm-eval behavior is deterministic as they
always set a seed. It is a bit surprising that it can have such a
drastic effect, but these are 2B vision-language models and a difficult
multiple choice dataset, not too far away from random guessing.


TEST PLAN:
no new src code

---------

Signed-off-by: Brian Dellabetta <[email protected]>
  • Loading branch information
brian-dellabetta authored Mar 11, 2025
1 parent 64175da commit c1fe865
Show file tree
Hide file tree
Showing 5 changed files with 18 additions and 7 deletions.
9 changes: 6 additions & 3 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,9 +60,12 @@
"datasets",
"accelerate>=0.20.3,!=1.1.0",
"pynvml",
"compressed-tensors"
if version_info.build_type == "release"
else "compressed-tensors-nightly",
"pillow",
(
"compressed-tensors"
if version_info.build_type == "release"
else "compressed-tensors-nightly"
),
],
extras_require={
"dev": [
Expand Down
2 changes: 1 addition & 1 deletion tests/lmeval/configs/vl_fp8_dynamic_per_token.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ cadence: weekly
model: Qwen/Qwen2-VL-2B-Instruct
model_class: TraceableQwen2VLForConditionalGeneration
scheme: FP8_DYNAMIC
seed: 42 # compressed model is sensitive to random seed
lmeval:
model: "hf-multimodal"
model_args:
Expand All @@ -10,7 +11,6 @@ lmeval:
convert_img_format: True
task: mmmu_val_economics
num_fewshot: 0
limit: 1000
batch_size: 8
metrics:
acc,none: 0.333
2 changes: 1 addition & 1 deletion tests/lmeval/configs/vl_int8_w8a8_dynamic_per_token.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ scheme: INT8_dyn_per_token
recipe: tests/e2e/vLLM/recipes/INT8/recipe_int8_channel_weight_dynamic_per_token.yaml
dataset_id: lmms-lab/flickr30k
dataset_split: "test[:512]"
seed: 42 #compressed model is sensitive to random seed
lmeval:
model: "hf-multimodal"
model_args:
Expand All @@ -13,7 +14,6 @@ lmeval:
convert_img_format: True
task: mmmu_val_economics
num_fewshot: 0
limit: 1000
metrics:
acc,none: 0.233
batch_size: 8
4 changes: 2 additions & 2 deletions tests/lmeval/configs/vl_w4a16_actorder_weight.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ recipe: tests/e2e/vLLM/recipes/actorder/recipe_w4a16_actorder_weight.yaml
dataset_id: lmms-lab/flickr30k
dataset_split: "test[:512]"
scheme: W4A16_actorder_group
seed: 42 #compressed model is sensitive to random seed
lmeval:
model: "hf-multimodal"
model_args:
Expand All @@ -13,7 +14,6 @@ lmeval:
convert_img_format: True
task: mmmu_val_economics
num_fewshot: 0
limit: 1000
metrics:
acc,none: 0.4
acc,none: 0.366
batch_size: 4
8 changes: 8 additions & 0 deletions tests/lmeval/test_lmeval.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
import os
import random
import shutil
from pathlib import Path

import numpy
import pytest
import torch
import yaml
from loguru import logger
from pydantic import BaseModel
Expand Down Expand Up @@ -73,6 +75,12 @@ def set_up(self):
self.quant_type = eval_config.get("quant_type")
self.save_dir = eval_config.get("save_dir")

seed = eval_config.get("seed", None)
if seed is not None:
random.seed(seed)
numpy.random.seed(seed)
torch.manual_seed(seed)

logger.info("========== RUNNING ==============")
logger.info(self.scheme)

Expand Down

0 comments on commit c1fe865

Please sign in to comment.