Question about energy estimate gap between ecologits and new study #154

cleophass · 2025-06-24T13:38:17Z

cleophass
Jun 24, 2025

Hello Ecologits team,

I recently came across the study "How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference" (arXiv:2505.09598v2). The paper introduces an infrastructure-aware benchmarking framework for inference footprint estimation across multiple proprietary and open-source LLMs, including GPT-4o.

According to their benchmarks, GPT-4o consumes approximately 1.788 Wh for long prompts and 0.42 Wh for short ones (as defined in their methodology). This seems considerably lower than the estimates produced by your calculator, which for similar prompt sizes (e.g., 100 input / 300 output) estimates up to 35.1 Wh.

At first, the methodologies appear structurally similar — both multiply execution time by a power value — but the paper models things like:

GPU utilization rate and batch size
Number of active GPUs based on model class
Data center overhead (PUE)
Inference time

As I dove deeper into their methodology, I realized that their energy estimation, like yours, is based on inference time. However, instead of relying on leaderboard studies or static power assumptions, they focus on GPU power draw and also factor in batch size and dynamic GPU utilization to compute the per-query energy share.

Have you had the chance to review this study? I’d be very interested in your perspective on the methodology and whether integrating infrastructure-aware parameters (e.g., TPS, dynamic utilization, or model class) is something you’re considering in future versions.

Thank you in advance for your time, and again for your contributions to the field.

samuelrince · 2025-06-25T22:35:06Z

samuelrince
Jun 25, 2025
Maintainer

Hello @cleophass,

Thank you for your feedback! We are aware that currently, our methodology might overshoot energy consumption. We are launching a workshop this summer to update parts of the methodology to fix that!

The issue comes mostly from the energy benchmark we use that is far from a "large-scale production deployment" scenario. We'll let you know when that's available!

PS: Just in case, if you are interested in contributing to this and have some time, I highly encourage you to join us 🤗

2 replies

cleophass Jun 26, 2025
Author

great news for your workshop, can't wait to see the results!

Otherwise, as a feature idea, I was thinking of adding a method for estimation that doesn't rely on an API call — a kind of method that would take into account the generated tokens, latency time, model, and hosting location. This could be useful for benchmarking without making API calls. would you be interested if I open a PR for this?

adrienbanse Jul 2, 2025
Maintainer

Hey @cleophass, we're currently working on such features! :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about energy estimate gap between ecologits and new study #154

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Question about energy estimate gap between ecologits and new study #154

Uh oh!

Uh oh!

cleophass Jun 24, 2025

Replies: 1 comment · 2 replies

Uh oh!

samuelrince Jun 25, 2025 Maintainer

Uh oh!

cleophass Jun 26, 2025 Author

Uh oh!

adrienbanse Jul 2, 2025 Maintainer

cleophass
Jun 24, 2025

Replies: 1 comment 2 replies

samuelrince
Jun 25, 2025
Maintainer

cleophass Jun 26, 2025
Author

adrienbanse Jul 2, 2025
Maintainer