Weight-Only Quantization Does Not Always Save Energy: An Empirical Study of LLM Inference Across NVIDIA GPU Platforms

Name: Weight-Only Quantization Energy Study
Creator: Hongping Zhang
License: https://opensource.org/licenses/MIT