[Bug] 自定义数据集，模型进行评测结果为0分 #1752

lomoonmoonbird · 2024-12-11T03:09:32Z

Prerequisite

I have searched Issues and Discussions but cannot get the expected help.
The bug has not been fixed in the latest version.

Type

I'm evaluating with the officially supported tasks/models/datasets.

Environment

{'CUDA available': True,
 'CUDA_HOME': None,
 'GCC': 'gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)',
 'GPU 0': 'NVIDIA A30',
 'MMEngine': '0.10.4',
 'MUSA available': False,
 'OpenCV': '4.10.0',
 'PyTorch': '2.3.1',
 'PyTorch compiling details': 'PyTorch built with:\n'
                              '  - GCC 9.3\n'
                              '  - C++ Version: 201703\n'
                              '  - Intel(R) oneAPI Math Kernel Library Version '
                              '2023.1-Product Build 20230303 for Intel(R) 64 '
                              'architecture applications\n'
                              '  - Intel(R) MKL-DNN v3.3.6 (Git Hash '
                              '86e6af5974177e513fd3fee58425e1063e7f1361)\n'
                              '  - OpenMP 201511 (a.k.a. OpenMP 4.5)\n'
                              '  - LAPACK is enabled (usually provided by '
                              'MKL)\n'
                              '  - NNPACK is enabled\n'
                              '  - CPU capability usage: AVX512\n'
                              '  - CUDA Runtime 12.1\n'
                              '  - NVCC architecture flags: '
                              '-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90\n'
                              '  - CuDNN 8.9.2\n'
                              '  - Magma 2.6.1\n'
                              '  - Build settings: BLAS_INFO=mkl, '
                              'BUILD_TYPE=Release, CUDA_VERSION=12.1, '
                              'CUDNN_VERSION=8.9.2, '
                              'CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, '
                              'CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 '
                              '-fabi-version=11 -fvisibility-inlines-hidden '
                              '-DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO '
                              '-DLIBKINETO_NOROCTRACER -DUSE_FBGEMM '
                              '-DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK '
                              '-DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE '
                              '-O2 -fPIC -Wall -Wextra -Werror=return-type '
                              '-Werror=non-virtual-dtor -Werror=bool-operation '
                              '-Wnarrowing -Wno-missing-field-initializers '
                              '-Wno-type-limits -Wno-array-bounds '
                              '-Wno-unknown-pragmas -Wno-unused-parameter '
                              '-Wno-unused-function -Wno-unused-result '
                              '-Wno-strict-overflow -Wno-strict-aliasing '
                              '-Wno-stringop-overflow -Wsuggest-override '
                              '-Wno-psabi -Wno-error=pedantic '
                              '-Wno-error=old-style-cast -Wno-missing-braces '
                              '-fdiagnostics-color=always -faligned-new '
                              '-Wno-unused-but-set-variable '
                              '-Wno-maybe-uninitialized -fno-math-errno '
                              '-fno-trapping-math -Werror=format '
                              '-Wno-stringop-overflow, LAPACK_INFO=mkl, '
                              'PERF_WITH_AVX=1, PERF_WITH_AVX2=1, '
                              'PERF_WITH_AVX512=1, TORCH_VERSION=2.3.1, '
                              'USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=1, '
                              'USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, '
                              'USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, '
                              'USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, '
                              'USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, '
                              'USE_ROCM_KERNEL_ASSERT=OFF, \n',
 'Python': '3.10.14 (main, May  6 2024, 19:42:50) [GCC 11.2.0]',
 'TorchVision': '0.18.1',
 'lmdeploy': "not installed:No module named 'lmdeploy'",
 'numpy_random_seed': 2147483648,
 'opencompass': '0.3.7+',
 'sys.platform': 'linux',
 'transformers': '4.42.3'}

Reproduces the problem - code/configuration sample

configs/datasets/jp_dataset/jp_data1.py:

from opencompass.datasets import JsonlDataset
from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.openicl.icl_retriever import ZeroRetriever
from opencompass.openicl.icl_inferencer import PPLInferencer
from opencompass.openicl.icl_evaluator import AccEvaluator,AccwithDetailsEvaluator
from opencompass.openicl.icl_inferencer import GenInferencer
from opencompass.datasets import MATHDataset, MATHEvaluator, math_postprocess
from opencompass.utils.text_postprocessors import first_capital_postprocess
reader_cfg = dict(
    input_columns=['question', 'answer'],
    output_column='answer',
)


infer_cfg = dict(
    # Prompt 生成配置
    prompt_template=dict(
        type=PromptTemplate,
        # Prompt 模板，模板形式与后续指定的 inferencer 类型相匹配
        # 这里为了计算 PPL，需要指定每个答案对应的 Prompt 模板
        template={
            0: 'calculate the following equation only return result: \nEquation: {question}\nAnswer: ',
            1:'Answer: {answer}'
        }),
    # 上下文样本配置，此处指定 `ZeroRetriever`，即不使用上下文样本
    retriever=dict(type=ZeroRetriever),
    # 推理方式配置
    #   - PPLInferencer 使用 PPL（困惑度）获取答案
    #   - GenInferencer 使用模型的生成结果获取答案
    inferencer=dict(type=GenInferencer))

#eval_cfg = dict(
#    evaluator=dict(type=MATHEvaluator), pred_postprocessor=dict(type=math_postprocess))
#eval_cfg = dict(
#    evaluator=dict(type=AccwithDetailsEvaluator))
eval_cfg = dict(
        evaluator=dict(type=AccwithDetailsEvaluator),
        pred_postprocessor=dict(type=first_capital_postprocess))

math_datasets = [
    dict(
        #type=HFDataset,
        type=JsonlDataset,
        abbr='jp_data',
        path='/home/jp_data/qa.jsonl',
        #path="piqa",
        #data_files="/home/jp_data/qa.jsonl",
        reader_cfg=reader_cfg,
        infer_cfg=infer_cfg,
        eval_cfg=eval_cfg
)
]

configs/models/qwen2_5/jp_qwen2_5_coder_7b.py:

from opencompass.models import HuggingFacewithChatTemplate
from opencompass.models import HuggingFaceCausalLM
#from opencompass.models import ModelScopeCausalLM
#from opencompass.models import GLM130B
models = [
    dict(
        type=HuggingFacewithChatTemplate,
        #type=GLM130B,
        abbr='qwen2_5_coder_jp',
        #path='/home/vllm/model/chatglm3-6b',
        path="/root/Docker/data/llms/qwen2.5-coder-7b",
        max_out_len=1024,
        batch_size=2,
        max_seq_len=4096,
        run_cfg=dict(num_gpus=1),
    )
]

datasets: /home/jp_data/qa.jsonl:

{"question": "752+361+181+933+235+986=", "answer": "3448"}
{"question": "712+165+223+711=", "answer": "1811"}
{"question": "921+975+888+539=", "answer": "3323"}
{"question": "752+321+388+643+568+982+468+397=", "answer": "4519"}
{"question": "1+1=", "answer": "2"}

config:

datasets=[
    dict(abbr='jp_data',
        eval_cfg=dict(
            evaluator=dict(
                type='opencompass.openicl.icl_evaluator.AccwithDetailsEvaluator'),
            pred_postprocessor=dict(
                type='opencompass.utils.text_postprocessors.first_capital_postprocess')),
        infer_cfg=dict(
            inferencer=dict(
                type='opencompass.openicl.icl_inferencer.GenInferencer'),
            prompt_template=dict(
                template=dict(
                    {0: 'calculate the following equation only return result: \nEquation: {question}\nAnswer: ',
                    1: 'Answer: {answer}'}),
                type='opencompass.openicl.icl_prompt_template.PromptTemplate'),
            retriever=dict(
                type='opencompass.openicl.icl_retriever.ZeroRetriever')),
        path='/home/jp_data/qa.jsonl',
        reader_cfg=dict(
            input_columns=[
                'question',
                'answer',
                ],
            output_column='answer'),
        type='opencompass.datasets.JsonlDataset'),
    ]
models=[
    dict(abbr='qwen2_5_coder_jp',
        batch_size=2,
        max_out_len=1024,
        max_seq_len=4096,
        path='/root/Docker/data/llms/qwen2.5-coder-7b',
        run_cfg=dict(
            num_gpus=1),
        type='opencompass.models.HuggingFacewithChatTemplate'),
    ]
summarizer=dict(
    summary_groups=[
        dict(name='agieval-chinese',
            subsets=[
                'agieval-gaokao-chinese',
                'agieval-gaokao-english',
                'agieval-gaokao-geography',
                'agieval-gaokao-history',
                'agieval-gaokao-biology',
                'agieval-gaokao-chemistry',

以下内容省略....

prediction:

{
    "0": {
        "origin_prompt": "calculate the following equation only return result: \nEquation: 752+361+181+933+235+986=\nAnswer: ",
        "prediction": "3407",
        "gold": "3448"
    },
    "1": {
        "origin_prompt": "calculate the following equation only return result: \nEquation: 712+165+223+711=\nAnswer: ",
        "prediction": "1271",
        "gold": "1811"
    },
    "2": {
        "origin_prompt": "calculate the following equation only return result: \nEquation: 921+975+888+539=\nAnswer: ",
        "prediction": "3,323",
        "gold": "3323"
    },
    "3": {
        "origin_prompt": "calculate the following equation only return result: \nEquation: 752+321+388+643+568+982+468+397=\nAnswer: ",
        "prediction": "4050",
        "gold": "4519"
    },
    "4": {
        "origin_prompt": "calculate the following equation only return result: \nEquation: 1+1=\nAnswer: ",
        "prediction": "2",
        "gold": "2"
    }
}

Reproduces the problem - command or script

CUDA_VISIBLE_DEVICES=0 /root/miniconda3/envs/opencompass/bin/python run.py --models jp_qwen2_5_coder_7b --datasets jp_data1

Reproduces the problem - error message

dataset    version    metric    mode      qwen2_5_coder_jp
---------  ---------  --------  ------  ------------------
jp_data    54f972     accuracy  gen                   0.00

Other information

预测里是有正确结果的，分数不应该是0，我需要怎么修改可以进行正常评测

The text was updated successfully, but these errors were encountered:

MaiziXiao · 2024-12-11T03:23:06Z

重新 eval 下加上--dump-eval-details，看下抽取结果的逻辑是否正确

lomoonmoonbird · 2024-12-11T03:26:43Z

重新 eval 下加上--dump-eval-details，看下抽取结果的逻辑是否正确

我把evaluator改回 MATHEvaluator，就出现分数了，这是为啥呀

lomoonmoonbird · 2024-12-11T03:30:48Z

重新 eval 下加上--dump-eval-details，看下抽取结果的逻辑是否正确
加上--dump-eval-details

prediction:

{
    "accuracy": 0.0,
    "details": {
        "0": {
            "prompt": "calculate the following equation only return result: \nEquation: 752+361+181+933+235+986=\nAnswer: ",
            "pred": "",
            "refr": "3448",
            "is_correct": false
        },
        "1": {
            "prompt": "calculate the following equation only return result: \nEquation: 712+165+223+711=\nAnswer: ",
            "pred": "",
            "refr": "1811",
            "is_correct": false
        },
        "2": {
            "prompt": "calculate the following equation only return result: \nEquation: 921+975+888+539=\nAnswer: ",
            "pred": "",
            "refr": "3323",
            "is_correct": false
        },
        "3": {
            "prompt": "calculate the following equation only return result: \nEquation: 752+321+388+643+568+982+468+397=\nAnswer: ",
            "pred": "",
            "refr": "4519",
            "is_correct": false
        },
        "4": {
            "prompt": "calculate the following equation only return result: \nEquation: 1+1=\nAnswer: ",
            "pred": "",
            "refr": "2",
            "is_correct": false
        }
    }
}

lomoonmoonbird · 2024-12-11T03:43:11Z

这个和model的type还有关系吧，我换成非chat的type分数变成0

官方文档里哪个地方有介绍model的type和模型的类型，prompt类型，数据集评测方式gen/ppl等之间的关系呢

mm-assistant bot assigned MaiziXiao Dec 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] 自定义数据集，模型进行评测结果为0分 #1752

[Bug] 自定义数据集，模型进行评测结果为0分 #1752

lomoonmoonbird commented Dec 11, 2024

MaiziXiao commented Dec 11, 2024

lomoonmoonbird commented Dec 11, 2024 •

edited

Loading

lomoonmoonbird commented Dec 11, 2024 •

edited

Loading

lomoonmoonbird commented Dec 11, 2024 •

edited

Loading

[Bug] 自定义数据集，模型进行评测结果为0分 #1752

[Bug] 自定义数据集，模型进行评测结果为0分 #1752

Comments

lomoonmoonbird commented Dec 11, 2024

Prerequisite

Type

Environment

Reproduces the problem - code/configuration sample

Reproduces the problem - command or script

Reproduces the problem - error message

Other information

MaiziXiao commented Dec 11, 2024

lomoonmoonbird commented Dec 11, 2024 • edited Loading

lomoonmoonbird commented Dec 11, 2024 • edited Loading

lomoonmoonbird commented Dec 11, 2024 • edited Loading

lomoonmoonbird commented Dec 11, 2024 •

edited

Loading

lomoonmoonbird commented Dec 11, 2024 •

edited

Loading

lomoonmoonbird commented Dec 11, 2024 •

edited

Loading