Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] 自定义数据集,模型进行评测结果为0分 #1752

Open
2 tasks done
lomoonmoonbird opened this issue Dec 11, 2024 · 4 comments
Open
2 tasks done

[Bug] 自定义数据集,模型进行评测结果为0分 #1752

lomoonmoonbird opened this issue Dec 11, 2024 · 4 comments
Assignees

Comments

@lomoonmoonbird
Copy link

Prerequisite

Type

I'm evaluating with the officially supported tasks/models/datasets.

Environment

{'CUDA available': True,
 'CUDA_HOME': None,
 'GCC': 'gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)',
 'GPU 0': 'NVIDIA A30',
 'MMEngine': '0.10.4',
 'MUSA available': False,
 'OpenCV': '4.10.0',
 'PyTorch': '2.3.1',
 'PyTorch compiling details': 'PyTorch built with:\n'
                              '  - GCC 9.3\n'
                              '  - C++ Version: 201703\n'
                              '  - Intel(R) oneAPI Math Kernel Library Version '
                              '2023.1-Product Build 20230303 for Intel(R) 64 '
                              'architecture applications\n'
                              '  - Intel(R) MKL-DNN v3.3.6 (Git Hash '
                              '86e6af5974177e513fd3fee58425e1063e7f1361)\n'
                              '  - OpenMP 201511 (a.k.a. OpenMP 4.5)\n'
                              '  - LAPACK is enabled (usually provided by '
                              'MKL)\n'
                              '  - NNPACK is enabled\n'
                              '  - CPU capability usage: AVX512\n'
                              '  - CUDA Runtime 12.1\n'
                              '  - NVCC architecture flags: '
                              '-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90\n'
                              '  - CuDNN 8.9.2\n'
                              '  - Magma 2.6.1\n'
                              '  - Build settings: BLAS_INFO=mkl, '
                              'BUILD_TYPE=Release, CUDA_VERSION=12.1, '
                              'CUDNN_VERSION=8.9.2, '
                              'CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, '
                              'CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 '
                              '-fabi-version=11 -fvisibility-inlines-hidden '
                              '-DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO '
                              '-DLIBKINETO_NOROCTRACER -DUSE_FBGEMM '
                              '-DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK '
                              '-DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE '
                              '-O2 -fPIC -Wall -Wextra -Werror=return-type '
                              '-Werror=non-virtual-dtor -Werror=bool-operation '
                              '-Wnarrowing -Wno-missing-field-initializers '
                              '-Wno-type-limits -Wno-array-bounds '
                              '-Wno-unknown-pragmas -Wno-unused-parameter '
                              '-Wno-unused-function -Wno-unused-result '
                              '-Wno-strict-overflow -Wno-strict-aliasing '
                              '-Wno-stringop-overflow -Wsuggest-override '
                              '-Wno-psabi -Wno-error=pedantic '
                              '-Wno-error=old-style-cast -Wno-missing-braces '
                              '-fdiagnostics-color=always -faligned-new '
                              '-Wno-unused-but-set-variable '
                              '-Wno-maybe-uninitialized -fno-math-errno '
                              '-fno-trapping-math -Werror=format '
                              '-Wno-stringop-overflow, LAPACK_INFO=mkl, '
                              'PERF_WITH_AVX=1, PERF_WITH_AVX2=1, '
                              'PERF_WITH_AVX512=1, TORCH_VERSION=2.3.1, '
                              'USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=1, '
                              'USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, '
                              'USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, '
                              'USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, '
                              'USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, '
                              'USE_ROCM_KERNEL_ASSERT=OFF, \n',
 'Python': '3.10.14 (main, May  6 2024, 19:42:50) [GCC 11.2.0]',
 'TorchVision': '0.18.1',
 'lmdeploy': "not installed:No module named 'lmdeploy'",
 'numpy_random_seed': 2147483648,
 'opencompass': '0.3.7+',
 'sys.platform': 'linux',
 'transformers': '4.42.3'}

Reproduces the problem - code/configuration sample

configs/datasets/jp_dataset/jp_data1.py:

from opencompass.datasets import JsonlDataset
from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.openicl.icl_retriever import ZeroRetriever
from opencompass.openicl.icl_inferencer import PPLInferencer
from opencompass.openicl.icl_evaluator import AccEvaluator,AccwithDetailsEvaluator
from opencompass.openicl.icl_inferencer import GenInferencer
from opencompass.datasets import MATHDataset, MATHEvaluator, math_postprocess
from opencompass.utils.text_postprocessors import first_capital_postprocess
reader_cfg = dict(
    input_columns=['question', 'answer'],
    output_column='answer',
)


infer_cfg = dict(
    # Prompt 生成配置
    prompt_template=dict(
        type=PromptTemplate,
        # Prompt 模板,模板形式与后续指定的 inferencer 类型相匹配
        # 这里为了计算 PPL,需要指定每个答案对应的 Prompt 模板
        template={
            0: 'calculate the following equation only return result: \nEquation: {question}\nAnswer: ',
            1:'Answer: {answer}'
        }),
    # 上下文样本配置,此处指定 `ZeroRetriever`,即不使用上下文样本
    retriever=dict(type=ZeroRetriever),
    # 推理方式配置
    #   - PPLInferencer 使用 PPL(困惑度)获取答案
    #   - GenInferencer 使用模型的生成结果获取答案
    inferencer=dict(type=GenInferencer))

#eval_cfg = dict(
#    evaluator=dict(type=MATHEvaluator), pred_postprocessor=dict(type=math_postprocess))
#eval_cfg = dict(
#    evaluator=dict(type=AccwithDetailsEvaluator))
eval_cfg = dict(
        evaluator=dict(type=AccwithDetailsEvaluator),
        pred_postprocessor=dict(type=first_capital_postprocess))

math_datasets = [
    dict(
        #type=HFDataset,
        type=JsonlDataset,
        abbr='jp_data',
        path='/home/jp_data/qa.jsonl',
        #path="piqa",
        #data_files="/home/jp_data/qa.jsonl",
        reader_cfg=reader_cfg,
        infer_cfg=infer_cfg,
        eval_cfg=eval_cfg
)
]

configs/models/qwen2_5/jp_qwen2_5_coder_7b.py:

from opencompass.models import HuggingFacewithChatTemplate
from opencompass.models import HuggingFaceCausalLM
#from opencompass.models import ModelScopeCausalLM
#from opencompass.models import GLM130B
models = [
    dict(
        type=HuggingFacewithChatTemplate,
        #type=GLM130B,
        abbr='qwen2_5_coder_jp',
        #path='/home/vllm/model/chatglm3-6b',
        path="/root/Docker/data/llms/qwen2.5-coder-7b",
        max_out_len=1024,
        batch_size=2,
        max_seq_len=4096,
        run_cfg=dict(num_gpus=1),
    )
]

datasets: /home/jp_data/qa.jsonl:

{"question": "752+361+181+933+235+986=", "answer": "3448"}
{"question": "712+165+223+711=", "answer": "1811"}
{"question": "921+975+888+539=", "answer": "3323"}
{"question": "752+321+388+643+568+982+468+397=", "answer": "4519"}
{"question": "1+1=", "answer": "2"}

config:

datasets=[
    dict(abbr='jp_data',
        eval_cfg=dict(
            evaluator=dict(
                type='opencompass.openicl.icl_evaluator.AccwithDetailsEvaluator'),
            pred_postprocessor=dict(
                type='opencompass.utils.text_postprocessors.first_capital_postprocess')),
        infer_cfg=dict(
            inferencer=dict(
                type='opencompass.openicl.icl_inferencer.GenInferencer'),
            prompt_template=dict(
                template=dict(
                    {0: 'calculate the following equation only return result: \nEquation: {question}\nAnswer: ',
                    1: 'Answer: {answer}'}),
                type='opencompass.openicl.icl_prompt_template.PromptTemplate'),
            retriever=dict(
                type='opencompass.openicl.icl_retriever.ZeroRetriever')),
        path='/home/jp_data/qa.jsonl',
        reader_cfg=dict(
            input_columns=[
                'question',
                'answer',
                ],
            output_column='answer'),
        type='opencompass.datasets.JsonlDataset'),
    ]
models=[
    dict(abbr='qwen2_5_coder_jp',
        batch_size=2,
        max_out_len=1024,
        max_seq_len=4096,
        path='/root/Docker/data/llms/qwen2.5-coder-7b',
        run_cfg=dict(
            num_gpus=1),
        type='opencompass.models.HuggingFacewithChatTemplate'),
    ]
summarizer=dict(
    summary_groups=[
        dict(name='agieval-chinese',
            subsets=[
                'agieval-gaokao-chinese',
                'agieval-gaokao-english',
                'agieval-gaokao-geography',
                'agieval-gaokao-history',
                'agieval-gaokao-biology',
                'agieval-gaokao-chemistry',

以下内容省略....

prediction:

{
    "0": {
        "origin_prompt": "calculate the following equation only return result: \nEquation: 752+361+181+933+235+986=\nAnswer: ",
        "prediction": "3407",
        "gold": "3448"
    },
    "1": {
        "origin_prompt": "calculate the following equation only return result: \nEquation: 712+165+223+711=\nAnswer: ",
        "prediction": "1271",
        "gold": "1811"
    },
    "2": {
        "origin_prompt": "calculate the following equation only return result: \nEquation: 921+975+888+539=\nAnswer: ",
        "prediction": "3,323",
        "gold": "3323"
    },
    "3": {
        "origin_prompt": "calculate the following equation only return result: \nEquation: 752+321+388+643+568+982+468+397=\nAnswer: ",
        "prediction": "4050",
        "gold": "4519"
    },
    "4": {
        "origin_prompt": "calculate the following equation only return result: \nEquation: 1+1=\nAnswer: ",
        "prediction": "2",
        "gold": "2"
    }
}

Reproduces the problem - command or script

CUDA_VISIBLE_DEVICES=0 /root/miniconda3/envs/opencompass/bin/python run.py --models jp_qwen2_5_coder_7b --datasets jp_data1

Reproduces the problem - error message

dataset    version    metric    mode      qwen2_5_coder_jp
---------  ---------  --------  ------  ------------------
jp_data    54f972     accuracy  gen                   0.00

image

Other information

预测里是有正确结果的,分数不应该是0,我需要怎么修改可以进行正常评测

@MaiziXiao
Copy link
Collaborator

重新 eval 下加上--dump-eval-details,看下抽取结果的逻辑是否正确

@lomoonmoonbird
Copy link
Author

lomoonmoonbird commented Dec 11, 2024

重新 eval 下加上--dump-eval-details,看下抽取结果的逻辑是否正确

我把evaluator改回 MATHEvaluator,就出现分数了,这是为啥呀
image

image

@lomoonmoonbird
Copy link
Author

lomoonmoonbird commented Dec 11, 2024

重新 eval 下加上--dump-eval-details,看下抽取结果的逻辑是否正确
加上--dump-eval-details
image

prediction:

{
    "accuracy": 0.0,
    "details": {
        "0": {
            "prompt": "calculate the following equation only return result: \nEquation: 752+361+181+933+235+986=\nAnswer: ",
            "pred": "",
            "refr": "3448",
            "is_correct": false
        },
        "1": {
            "prompt": "calculate the following equation only return result: \nEquation: 712+165+223+711=\nAnswer: ",
            "pred": "",
            "refr": "1811",
            "is_correct": false
        },
        "2": {
            "prompt": "calculate the following equation only return result: \nEquation: 921+975+888+539=\nAnswer: ",
            "pred": "",
            "refr": "3323",
            "is_correct": false
        },
        "3": {
            "prompt": "calculate the following equation only return result: \nEquation: 752+321+388+643+568+982+468+397=\nAnswer: ",
            "pred": "",
            "refr": "4519",
            "is_correct": false
        },
        "4": {
            "prompt": "calculate the following equation only return result: \nEquation: 1+1=\nAnswer: ",
            "pred": "",
            "refr": "2",
            "is_correct": false
        }
    }
}

@lomoonmoonbird
Copy link
Author

lomoonmoonbird commented Dec 11, 2024

这个和model的type还有关系吧,我换成非chat的type分数变成0

image
image

官方文档里哪个地方有介绍model的type和模型的类型,prompt类型,数据集评测方式gen/ppl等之间的关系呢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants