AI Carbon Footprint Calculator

Most AI carbon tools focus on a single API call. This calculator covers the full picture: model training, inference at scale, and hardware manufacturing across text, image, audio, and video workloads. Built on the TokenFlop computational model (Digital4Better), it lets you configure hardware, PUE, energy mix, and corpus size to simulate realistic enterprise scenarios not just theoretical benchmarks. Results are order-of-magnitude estimates derived from a physics-based computational model using publicly available hardware specifications and energy data. They are designed for comparative scenario analysis not direct emissions reporting. Assumptions and limitations are documented in the [TokenFlop methodology paper →].

Free and openly accessible. Designed for AI teams making infrastructure, model selection, and efficiency decisions and for organizations building AI governance frameworks aligned with emerging disclosure standards.

FLOPs → GPUh → CO₂e Modeling Framework

A bottom-up modeling method based on estimating the computational load (FLOPs) induced by model usage, converted into GPU time (GPUh), then into energy consumption and GHG emissions. Incorporates the manufacturing footprint of equipment following a Life Cycle Assessment (LCA) approach (ISO 14040 / ITU L.1410).


1. Base Unit and Input Data

The base unit is the token — a discrete unit processed by the model to represent an input or output. Depending on the modality:

  • Text: word fragment (3–4 characters on average). 1,000 tokens ≈ 750 words in English.
  • Image: spatial patch (e.g. 512×512 image with 16×16 patches → 1,024 tokens).
  • Audio: temporal token from a codec (e.g. 10s clip at 24 kHz, 320 downscale, 8 channels → ~6,000 tokens).
  • Video: spatial token per frame × number of frames (e.g. 4s at 24fps, 512×512, 16×16 patches → ~98,304 tokens).

2. Computational Load Estimation (FLOPs)

Computational load is estimated by usage phase:

PhaseFormula
TrainingFLOP ≈ 6 × P_total × T_training
Fine-tuningFLOP ≈ (2 × P_total + 4 × P_tunable) × T_training
Inference — prompt processingFLOP ≈ 1 × P_active × T_input
Inference — text generationFLOP ≈ 2 × P_active × T_output
Image generationFLOP ≈ 2 × P_active × N_activation
Video generation (spatio-temporal)FLOP ≈ S × (2 × P_active × N_activation × F + 2 × (F×T)² × d)

Inference assumption: systematic presence of a KV cache, reducing prompt cost to ~1 FLOP per parameter/token.


3. Conversion to GPU Time (GPUh)

D_gpu = FLOP / (C_gpu × MFU)

  • C_gpu: theoretical GPU capacity in FLOP/h
  • MFU (Model FLOP Utilization): percentage of theoretical capacity effectively usable, estimated between 25% and 50% depending on model and hardware type (source: NVIDIA Benchmarks). Default value: 40% for training.

4. Conversion to Energy Consumption

E_gpu = D_gpu × P_gpu C_gpu_datacenter = E_gpu × PUE

  • P_gpu: GPU power in Watts (e.g. 700 W for an H100)
  • PUE (Power Usage Effectiveness): datacenter energy efficiency. Default value: 1.2

5. Operational Environmental Impact

I_operational = E_gpu × F_energy

  • F_energy: electricity emission factor by region, sourced from the Digital4Better open data repository (e.g. 0.420 kgCO₂e/kWh for the United States, 0.040 kgCO₂e/kWh for France).

6. Manufacturing Impact (Embodied Footprint)

I_embodied = I_manufacturing × (D_usage / D_lifespan)

The manufacturing footprint is allocated proportionally to usage time over the equipment’s estimated lifespan (5 years by default). Non-GPU server components (CPU, RAM, storage, chassis) are distributed proportionally to the number of GPUs per server.


7. Illustrative Application — Llama 3.1 405B

As a consistency check, TokenFlop was applied to the open-source Llama 3.1 model (405B parameters), trained on ~15 trillion tokens using 24,576 H100 GPUs:

ModelEstimated GPU TimeEstimated Emissions
Llama 3.1 8B1.46M GPUh~420 tCO₂e
Llama 3.1 70B7.0M GPUh~2,040 tCO₂e
Llama 3.1 405B30.84M GPUh~8,930 tCO₂e

Discrepancy vs. Hugging Face data: < 2%, validating the consistency of the model.

For inference, with an average prompt of 400 tokens on Llama 3.1 405B: ~0.1 gCO₂e per request.


Assumptions and Limitations

Results produced by this simulator are theoretical modeling estimates. They do not constitute a direct measurement of actual emissions.

Main sources of uncertainty include:

  • Actual model characteristics (often confidential): training data, effective MFU, number of hidden dimensions.
  • Lack of reliable LCA data for certain AI-specific hardware.
  • TPU, FPGA, and ASIC specificities are not accounted for.
  • Memory adequacy between the model and selected hardware is not verified.

This method is suited for relative scenario comparison, AI project scoping, and prospective assessment — not for certified emissions reporting.


Bibliography

[1] Schwartz, R., et al. (2020). Green AI. Communications of the ACM. arXiv
[2] IEA (2024). Energy and AI.iea.blob.core.windows.net
[3] ISO 14040/14044. Environmental management — Life Cycle Assessment. 
[4] ITU L.1410. Methodology for the assessment of the environmental life cycle impact of ICT goods, networks and services. 
[5] Meta (2024). The Llama 3 Herd of Models. arXiv
[6] Digital4Better. Open Data Repository. digital4better.github.io/data
[7] Digital4Better. Applied Methodology for Generative AI. digital4better.github.io/methodology/ai
[8] NVidia (2025). Llama 3.1 70B DGXC Benchmarking.