Skip to content

Difficulty reproducing results on Llama-3.1-8B-Instruct: Significant performance gap compared to Mistral #1

@Liam-L2

Description

@Liam-L2

Thank you for your impressive work on AnDPro!
I am currently trying to reproduce the results presented in your paper. I have successfully reproduced the performance on Mistral-7B-Instruct-v0.2, matching the results reported in Table 11.
However, I am facing difficulties reproducing the results on Llama-3.1-8B-Instruct. According to Table 3 in Appendix C.12, AnDPro should also achieve SOTA performance on Llama-3.1. In my experiments, the performance drops significantly on Llama-3.1 compared to the full cache baseline, whereas it works perfectly on Mistral.
My Setup:
● Model: Llama-3.1-8B-Instruct
● Hyperparameters: Aligned with the paper (Window Size=32, Chunk Size=4, $b=0$)3.
● Method: Using cross-head budget allocation and chunking as described in the implementation details.

Any guidance or reference implementation details regarding Llama-3.1 would be greatly appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions