Skip to content

Benchmarking

The BenchmarkRunner provides reproducible, tiered experiments with checkpointing, automatic plot generation, and a rich summary table.

Quick Start

# Run both versions, noiseless, auto-detect backend
python experiments/run_full_benchmark.py

# Medium tier, Quantum Enhanced only, with noise
python experiments/run_full_benchmark.py --tier medium --version B --noise

# Compare compute backends
python experiments/run_full_benchmark.py --compare-backends

Version Labels

Flag Internal Human Label Description
--version A Version A Classical Parity (5q) Same algorithm as classical RLSTC, but with quantum distance computation. Direct 1:1 comparison baseline.
--version B Version B Quantum Enhanced (8q) Additional quantum features: 8 qubits, data reuploading, richer encoding, optimised readout.
--version both A + B Both Default — runs both for side-by-side comparison.

Tiers

Tier Trajectories Epochs Approx. Time
small 20 3 ~5 min
medium 50 8 ~15 min
large 100 15 ~45+ min

CLI Flags

--tier small|medium|large     Benchmark tier (default: small)
--version A|B|both            Version selection (default: both)
--noise                       Include Eagle + Heron noise runs
--backend auto|cpu|mlx|cuda   Compute backend (default: auto)
--compare-backends            Run on all available backends
--seed N                      Random seed (default: 42)
--resume                      Resume from checkpoint
--output-dir PATH             Override output directory

Summary Table

The benchmark prints a 13-column summary table:

Run                          Qubits  Params Episodes  Init OD  Final OD      ΔOD  OD Impr%      F1   AvgRew  Conv.Ep  ParamEff     Time
Classical Parity (5q) [noiseless]     5      20       60   1.2345   0.8901   0.3444    27.9% 0.7234   0.4521       23  0.022605    12.3s
Quantum Enhanced (8q) [noiseless]     8      56       60   1.2345   0.7890   0.4455    36.1% 0.8012   0.5678       18  0.010139    24.7s

Columns:

Column Description
Run Version label + noise status
Qubits Number of qubits in VQ-DQN circuit
Params Trainable variational parameters
Episodes Total training episodes completed
Init OD Overall Distance at start
Final OD Overall Distance at end
ΔOD Absolute OD improvement
OD Impr% Percentage OD improvement
F1 Segmentation F1 score
AvgRew Average reward over last 10 episodes
Conv.Ep Episode at which 90% of max reward was reached
ParamEff Avg reward / number of parameters
Time Wall-clock runtime

Checkpointing

The benchmark saves progress after each run to .checkpoint.json in the output directory. Use --resume to continue from where you left off if a run is interrupted.

Generated Plots

The benchmark auto-generates these plots in outputs/benchmark_*/plots/:

File Content
learning_curves.png Reward curves A vs B
od_convergence.png OD vs epoch convergence
metric_comparison.png Grouped bar chart of all metrics
epsilon_schedule.png ε-greedy exploration decay
circuit_summary.png VQ-DQN circuit comparison table
noise_impact.png Noise resilience (only with --noise)

Python API

from q_rlstc.visualization import BenchmarkRunner

runner = BenchmarkRunner(
    tier="medium",
    versions=["A", "B"],
    compute_backend="auto",
    include_noise=False,
)

results = runner.run()
runner.generate_plots(results)
runner.print_summary(results)

Output Structure

outputs/benchmark_20260220_103000/
├── .checkpoint.json
├── metrics_summary.json
├── A_ideal_history.npz
├── B_ideal_history.npz
└── plots/
    ├── learning_curves.png
    ├── od_convergence.png
    ├── metric_comparison.png
    ├── epsilon_schedule.png
    └── circuit_summary.png

Compute Backends | Experimental Design