LLM Energy Benchmark ← Back to Homepage
End-to-End Baseline FP16 Reference Point Published: 2026-06-08

RTX PRO 6000 Blackwell: FP16 End-to-End Baseline for Qwen2.5-3B

This update adds a standard FP16 end-to-end reference measurement for Qwen2.5-3B on the NVIDIA RTX PRO 6000 Blackwell Server Edition. Unlike the earlier phase-separated backend-compatibility case study, these measurements use the main leaderboard unit of Joules per 1,000 generated tokens.

GPU
RTX PRO 6000
Blackwell Server Edition, 96GB
Model
Qwen2.5-3B
FP16, batch size 1
Protocol
End-to-End
256 and 512 generated tokens
Runs
n=10
2 warmup runs

Experimental Setup

ItemValue
GPUNVIDIA RTX PRO 6000 Blackwell Server Edition
CUDA capability12.0
Software stackPyTorch 2.11.0+cu128, CUDA runtime 12.8
Model path used in run/root/autodl-tmp/models/Qwen2.5-3B
PrecisionFP16
Batch size1
Prompt length128 tokens
Generated lengths256 and 512 tokens
Power measurementNVML available during run

FP16 End-to-End Results

Model GPU Precision Batch Generated Tokens Energy (J/1k tok) Throughput (tok/s) Average Power (W) Runs
Qwen2.5-3B RTX PRO 6000 Blackwell FP16 1 256 2,660.83 ± 124.87 83.05 ± 0.38 221.58 ± 10.12 10
Qwen2.5-3B RTX PRO 6000 Blackwell FP16 1 512 2,709.67 ± 38.22 83.17 ± 0.26 225.61 ± 2.90 10

The two generation lengths produce similar throughput and energy per 1,000 tokens, providing a stable FP16 anchor point for RTX PRO 6000 Blackwell under this end-to-end protocol.

Reported ± values are one standard deviation across 10 runs. The 256-token energy result has a CV of approximately 4.7%, above the <3% threshold used for the main benchmark; the 512-token energy result has a CV of approximately 1.4%.

Interpretation and Relationship to the Blackwell Phase Study

This FP16 end-to-end result should be treated separately from the June 3 phase-separated measurements. The present update uses the standard leaderboard unit, J/1k generated tokens, and measures FP16 inference only. The June 3 update instead examined prefill/decode energy and quantized-path backend behavior under bitsandbytes 0.49.2.

Together, the two updates support a conservative interpretation: the RTX PRO 6000 Blackwell stack produced stable FP16 end-to-end inference, while the earlier quantized phase-separated results should be interpreted as backend-architecture interaction evidence rather than normal quantized leaderboard performance.

Related