"Quantization doesn't always save energy. See for yourself."
Every quantization tool tells you how to quantize. We tell you whether you should.
From the working paper “Weight-Only Quantization Does Not Always Save Energy” · under review at Sustainable Computing: Informatics and Systems (2 reviews received).
Fitted crossover curve · your model marked with its uncertainty
Fitted curveMeasured anchorsYour model ± CIAbove zero = penalty · below = savings
Report schema output (JSON)
From static query to constrained optimization: give your GPU, model size, an objective and optional
budgets — the tool exhaustively searches precision × batch × context, filters by your constraints and returns
the objective-optimal config, alternatives and the energy↔latency Pareto frontier. Energy is measured-anchored;
latency/throughput are a roofline model; VRAM is computed — each field is labelled below.
Recommended configuration
Alternatives
Energy ↔ latency Pareto frontier
Non-dominated trade-offs: no other config is both lower-energy and lower-latency.