End-to-End Baseline FP16 Reference Point Published: 2026-06-08

RTX PRO 6000 Blackwell: FP16 End-to-End Baseline for Qwen2.5-3B

This update adds a standard FP16 end-to-end reference measurement for Qwen2.5-3B on the NVIDIA RTX PRO 6000 Blackwell Server Edition. Unlike the earlier phase-separated backend-compatibility case study, these measurements use the main leaderboard unit of Joules per 1,000 generated tokens.

Download raw archive Related phase-separated case study

GPU

RTX PRO 6000

Blackwell Server Edition, 96GB

Model

Qwen2.5-3B

FP16, batch size 1

Protocol

End-to-End

256 and 512 generated tokens

Runs

n=10

2 warmup runs

Experimental Setup

Item	Value
GPU	NVIDIA RTX PRO 6000 Blackwell Server Edition
CUDA capability	12.0
Software stack	PyTorch 2.11.0+cu128, CUDA runtime 12.8
Model path used in run	/root/autodl-tmp/models/Qwen2.5-3B
Precision	FP16
Batch size	1
Prompt length	128 tokens
Generated lengths	256 and 512 tokens
Power measurement	NVML available during run

FP16 End-to-End Results

Model	GPU	Precision	Batch	Generated Tokens	Energy (J/1k tok)	Throughput (tok/s)	Average Power (W)	Runs
Qwen2.5-3B	RTX PRO 6000 Blackwell	FP16	1	256	2,660.83 ± 124.87	83.05 ± 0.38	221.58 ± 10.12	10
Qwen2.5-3B	RTX PRO 6000 Blackwell	FP16	1	512	2,709.67 ± 38.22	83.17 ± 0.26	225.61 ± 2.90	10

The two generation lengths produce similar throughput and energy per 1,000 tokens, providing a stable FP16 anchor point for RTX PRO 6000 Blackwell under this end-to-end protocol.

Reported ± values are one standard deviation across 10 runs. The 256-token energy result has a CV of approximately 4.7%, above the <3% threshold used for the main benchmark; the 512-token energy result has a CV of approximately 1.4%.

Interpretation and Relationship to the Blackwell Phase Study

This FP16 end-to-end result should be treated separately from the June 3 phase-separated measurements. The present update uses the standard leaderboard unit, J/1k generated tokens, and measures FP16 inference only. The June 3 update instead examined prefill/decode energy and quantized-path backend behavior under bitsandbytes 0.49.2.

Together, the two updates support a conservative interpretation: the RTX PRO 6000 Blackwell stack produced stable FP16 end-to-end inference, while the earlier quantized phase-separated results should be interpreted as backend-architecture interaction evidence rather than normal quantized leaderboard performance.

Previous update: Blackwell phase-separated case study Return to Homepage

Read the paper Blackwell phase profiling Qwen2.5-3B on Tesla T4 Leaderboard & data