### 🔖 Enhancement description Details to include: - ✅ Tokens used for each question (and average per model) - ✅ Separate model input & output price - ✅ Duration for each question (and average per model) - ✅ Cost for each question (and average per model) - ✅ TPS of model, like price of model - ✅ total cost (sum of all) - ✅ Remember total tool calls done for each question, and average on model - Structure that stores old benchmarks too, not just latest ### 🎤 Pitch More insightful benhmark ### 👀 Have you spent some time to check if this issue has been raised before? - [x] I checked and didn't find similar issue ### 🏢 Have you read the Code of Conduct? - [x] I have read the [Code of Conduct](https://github.com/appwrite/.github/blob/main/CODE_OF_CONDUCT.md)
🔖 Enhancement description
Details to include:
🎤 Pitch
More insightful benhmark
👀 Have you spent some time to check if this issue has been raised before?
🏢 Have you read the Code of Conduct?