LLM Energy Benchmark ← Back to Homepage
Supplemental Update Published: 2026-04-18

Qwen2.5-3B on Tesla T4: New Measurement Results

This update adds a focused Tesla T4 benchmark for Qwen2.5-3B, comparing FP16 and NF4 across batch sizes 1, 2, and 4. The new data further validates the crossover effect for smaller models on Turing GPUs.

Model
Qwen2.5-3B
GPU
Tesla T4
Key Result
NF4 +7.4% to +39.9%

Table 8: Detailed Results (Energy per token)

Model GPU Precision Batch Size Energy (mJ/token) CV NF4 vs FP16
Qwen2.5-3BTesla T4FP1612840.7771.57%-
Qwen2.5-3BTesla T4FP1621403.6122.33%-
Qwen2.5-3BTesla T4FP164731.0031.97%-
Qwen2.5-3BTesla T4NF413051.4257.18%+7.4%
Qwen2.5-3BTesla T4NF421963.6715.32%+39.9%
Qwen2.5-3BTesla T4NF44938.0721.68%+28.4%
Qwen2.5-3BTesla T4INT81--failed
Qwen2.5-3BTesla T4INT82--failed
Qwen2.5-3BTesla T4INT84--failed

Interpretation: For this 3B model on Tesla T4, NF4 quantization increases energy usage across all tested batch sizes.

Note: INT8 runs for Qwen2.5-3B on this T4 setup were not stable/reproducible in this measurement batch, so only FP16 and NF4 are used for quantitative comparison.

Figure 5 Summary