Supplemental Update
Published: 2026-04-18
Qwen2.5-3B on Tesla T4: New Measurement Results
This update adds a focused Tesla T4 benchmark for Qwen2.5-3B, comparing FP16 and NF4 across batch sizes 1, 2, and 4. The new data further validates the crossover effect for smaller models on Turing GPUs.
Model
Qwen2.5-3B
GPU
Tesla T4
Key Result
NF4 +7.4% to +39.9%
Table 8: Detailed Results (Energy per token)
| Model | GPU | Precision | Batch Size | Energy (mJ/token) | CV | NF4 vs FP16 |
|---|---|---|---|---|---|---|
| Qwen2.5-3B | Tesla T4 | FP16 | 1 | 2840.777 | 1.57% | - |
| Qwen2.5-3B | Tesla T4 | FP16 | 2 | 1403.612 | 2.33% | - |
| Qwen2.5-3B | Tesla T4 | FP16 | 4 | 731.003 | 1.97% | - |
| Qwen2.5-3B | Tesla T4 | NF4 | 1 | 3051.425 | 7.18% | +7.4% |
| Qwen2.5-3B | Tesla T4 | NF4 | 2 | 1963.671 | 5.32% | +39.9% |
| Qwen2.5-3B | Tesla T4 | NF4 | 4 | 938.072 | 1.68% | +28.4% |
| Qwen2.5-3B | Tesla T4 | INT8 | 1 | - | - | failed |
| Qwen2.5-3B | Tesla T4 | INT8 | 2 | - | - | failed |
| Qwen2.5-3B | Tesla T4 | INT8 | 4 | - | - | failed |
Interpretation: For this 3B model on Tesla T4, NF4 quantization increases energy usage across all tested batch sizes.
Note: INT8 runs for Qwen2.5-3B on this T4 setup were not stable/reproducible in this measurement batch, so only FP16 and NF4 are used for quantitative comparison.
Figure 5 Summary
- - Batch 1: NF4 energy is 7.4% higher than FP16.
- - Batch 2: NF4 energy is 39.9% higher than FP16.
- - Batch 4: NF4 energy is 28.4% higher than FP16.