Replies: 1 comment 2 replies
-
|
Hello @cleophass, Thank you for your feedback! We are aware that currently, our methodology might overshoot energy consumption. We are launching a workshop this summer to update parts of the methodology to fix that! The issue comes mostly from the energy benchmark we use that is far from a "large-scale production deployment" scenario. We'll let you know when that's available! PS: Just in case, if you are interested in contributing to this and have some time, I highly encourage you to join us 🤗 |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello Ecologits team,
I recently came across the study "How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference" (arXiv:2505.09598v2). The paper introduces an infrastructure-aware benchmarking framework for inference footprint estimation across multiple proprietary and open-source LLMs, including GPT-4o.
According to their benchmarks, GPT-4o consumes approximately 1.788 Wh for long prompts and 0.42 Wh for short ones (as defined in their methodology). This seems considerably lower than the estimates produced by your calculator, which for similar prompt sizes (e.g., 100 input / 300 output) estimates up to 35.1 Wh.
At first, the methodologies appear structurally similar — both multiply execution time by a power value — but the paper models things like:
As I dove deeper into their methodology, I realized that their energy estimation, like yours, is based on inference time. However, instead of relying on leaderboard studies or static power assumptions, they focus on GPU power draw and also factor in batch size and dynamic GPU utilization to compute the per-query energy share.
Have you had the chance to review this study? I’d be very interested in your perspective on the methodology and whether integrating infrastructure-aware parameters (e.g., TPS, dynamic utilization, or model class) is something you’re considering in future versions.
Thank you in advance for your time, and again for your contributions to the field.
Beta Was this translation helpful? Give feedback.
All reactions