One line of code can waste 147% more energy. Install our GitHub Bot and it auto-audits every PR — catching issues that even experienced engineers miss.
load_in_8bit=Truebatch_size=1Common quantization advice is wrong. We measured it.
+147% energy · -76% throughput · +95.7% waste
Saves energy · +84% throughput · Data-backed
Zero configuration. No code changes needed.
Push code with LLM quantization configs
(BitsAndBytesConfig, etc.)
and create a PR as usual.
The Bot scans the diff, flags energy waste patterns, and posts a comment with data-backed fixes.
Automatic PR comment with prioritized issues and fixes
Scanned 1 Python file(s) in this PR. 1 critical issue(s) found.
load_in_8bit=True without llm_int8_threshold=0.0 causes 17–147% energy waste due to INT8↔FP16 type conversion at every linear layer.
llm_int8_threshold=0.0Processing prompts in a loop wastes up to 95.7% energy. Use batched inference or vLLM continuous batching.
📊 Based on 93+ measurements across RTX 4090D / A800 / RTX 5090 · View full data
Use our GitHub Action for deeper pipeline control — hardware detection, baseline calibration, and energy regression gating.
One-click install, automatic PR audit
CI/CD pipeline, team-level control
6 energy waste patterns, backed by real GPU measurements
ecocompute-energy-auditor, click Configure → Uninstall.
Install the Bot in 60 seconds. Next time you open a PR, you'll know exactly where your energy goes.
Works with any public or private repo · Supports GitHub Organizations