AI Carbon Footprint Calculator
Most AI carbon tools focus on a single API call. This calculator covers the full picture: model training, inference at scale, and hardware manufacturing across text, image, audio, and video workloads. Built on the TokenFlop computational model (Digital4Better), it lets you configure hardware, PUE, energy mix, and corpus size to simulate realistic enterprise scenarios not just theoretical benchmarks. Results are order-of-magnitude estimates derived from a physics-based computational model using publicly available hardware specifications and energy data. They are designed for comparative scenario analysis not direct emissions reporting. Assumptions and limitations are documented in the [TokenFlop methodology paper →].
Free and openly accessible. Designed for AI teams making infrastructure, model selection, and efficiency decisions and for organizations building AI governance frameworks aligned with emerging disclosure standards.
FLOPs → GPUh → CO₂e Modeling Framework
A bottom-up modeling method based on estimating the computational load (FLOPs) induced by model usage, converted into GPU time (GPUh), then into energy consumption and GHG emissions. Incorporates the manufacturing footprint of equipment following a Life Cycle Assessment (LCA) approach (ISO 14040 / ITU L.1410).
1. Base Unit and Input Data
The base unit is the token — a discrete unit processed by the model to represent an input or output. Depending on the modality:
- Text: word fragment (3–4 characters on average). 1,000 tokens ≈ 750 words in English.
- Image: spatial patch (e.g. 512×512 image with 16×16 patches → 1,024 tokens).
- Audio: temporal token from a codec (e.g. 10s clip at 24 kHz, 320 downscale, 8 channels → ~6,000 tokens).
- Video: spatial token per frame × number of frames (e.g. 4s at 24fps, 512×512, 16×16 patches → ~98,304 tokens).
2. Computational Load Estimation (FLOPs)
Computational load is estimated by usage phase:
| Phase | Formula |
|---|---|
| Training | FLOP ≈ 6 × P_total × T_training |
| Fine-tuning | FLOP ≈ (2 × P_total + 4 × P_tunable) × T_training |
| Inference — prompt processing | FLOP ≈ 1 × P_active × T_input |
| Inference — text generation | FLOP ≈ 2 × P_active × T_output |
| Image generation | FLOP ≈ 2 × P_active × N_activation |
| Video generation (spatio-temporal) | FLOP ≈ S × (2 × P_active × N_activation × F + 2 × (F×T)² × d) |
Inference assumption: systematic presence of a KV cache, reducing prompt cost to ~1 FLOP per parameter/token.
3. Conversion to GPU Time (GPUh)
D_gpu = FLOP / (C_gpu × MFU)
- C_gpu: theoretical GPU capacity in FLOP/h
- MFU (Model FLOP Utilization): percentage of theoretical capacity effectively usable, estimated between 25% and 50% depending on model and hardware type (source: NVIDIA Benchmarks). Default value: 40% for training.
4. Conversion to Energy Consumption
E_gpu = D_gpu × P_gpu C_gpu_datacenter = E_gpu × PUE
- P_gpu: GPU power in Watts (e.g. 700 W for an H100)
- PUE (Power Usage Effectiveness): datacenter energy efficiency. Default value: 1.2
5. Operational Environmental Impact
I_operational = E_gpu × F_energy
- F_energy: electricity emission factor by region, sourced from the Digital4Better open data repository (e.g. 0.420 kgCO₂e/kWh for the United States, 0.040 kgCO₂e/kWh for France).
6. Manufacturing Impact (Embodied Footprint)
I_embodied = I_manufacturing × (D_usage / D_lifespan)
The manufacturing footprint is allocated proportionally to usage time over the equipment’s estimated lifespan (5 years by default). Non-GPU server components (CPU, RAM, storage, chassis) are distributed proportionally to the number of GPUs per server.
7. Illustrative Application — Llama 3.1 405B
As a consistency check, TokenFlop was applied to the open-source Llama 3.1 model (405B parameters), trained on ~15 trillion tokens using 24,576 H100 GPUs:
| Model | Estimated GPU Time | Estimated Emissions |
|---|---|---|
| Llama 3.1 8B | 1.46M GPUh | ~420 tCO₂e |
| Llama 3.1 70B | 7.0M GPUh | ~2,040 tCO₂e |
| Llama 3.1 405B | 30.84M GPUh | ~8,930 tCO₂e |
Discrepancy vs. Hugging Face data: < 2%, validating the consistency of the model.
For inference, with an average prompt of 400 tokens on Llama 3.1 405B: ~0.1 gCO₂e per request.
Assumptions and Limitations
Results produced by this simulator are theoretical modeling estimates. They do not constitute a direct measurement of actual emissions.
Main sources of uncertainty include:
- Actual model characteristics (often confidential): training data, effective MFU, number of hidden dimensions.
- Lack of reliable LCA data for certain AI-specific hardware.
- TPU, FPGA, and ASIC specificities are not accounted for.
- Memory adequacy between the model and selected hardware is not verified.
This method is suited for relative scenario comparison, AI project scoping, and prospective assessment — not for certified emissions reporting.
Bibliography
[1] Schwartz, R., et al. (2020). Green AI. Communications of the ACM. arXiv
[2] IEA (2024). Energy and AI.iea.blob.core.windows.net
[3] ISO 14040/14044. Environmental management — Life Cycle Assessment.
[4] ITU L.1410. Methodology for the assessment of the environmental life cycle impact of ICT goods, networks and services.
[5] Meta (2024). The Llama 3 Herd of Models. arXiv
[6] Digital4Better. Open Data Repository. digital4better.github.io/data
[7] Digital4Better. Applied Methodology for Generative AI. digital4better.github.io/methodology/ai
[8] NVidia (2025). Llama 3.1 70B DGXC Benchmarking.