Could you please provide access to the evaluation code? The performance of the tasks achieved by our self-developed evaluation code, based on the SFT weights you provided, differs from the results reported by CM2. We are unsure if there is an issue with any part of our reproduction π
Could you please provide access to the evaluation code? The performance of the tasks achieved by our self-developed evaluation code, based on the SFT weights you provided, differs from the results reported by CM2. We are unsure if there is an issue with any part of our reproduction π