Experimental Setup & Reproducibility Protocol¶
1. Hardware & Software Environment¶
| Component | Specification |
|---|---|
| Machine | Apple MacBook Pro (M-series) |
| Python | 3.12.x |
| NumPy | 2.4.x |
| Qiskit | 1.x (statevector simulator) |
| OS | macOS |
| Git hash | Recorded per run (see JSON output env.git_hash) |
All experiments use
_collect_env_metadata()inrun_thesis_experiments.pyto record the exact Python, NumPy, and Qiskit versions at runtime into the JSON output.
2. Dataset¶
| Property | Value |
|---|---|
| Source | T-drive taxi dataset (Beijing, 2008–2009) |
| Preprocessing | Tdrive_norm_traj (normalized) |
| Cluster centers | tdrive_clustercenter (pre-computed) |
| Training size | 27 trajectories (90% of 30) |
| Validation size | 3 trajectories (10% of 30, fixed partition) |
| State dimensionality | 5 (IED, split_OD, min_dist, segment_length, step_index) |
| Action space | Binary: EXTEND (0) / CUT (1) |
The 90/10 split is deterministic per seed via TrajectoryScheduler(validation_pct=0.1, seed=seed).
3. Protocol Constants (Shared Across All Models)¶
All agents share identical hyperparameters — the only variable is the policy network architecture.
| Parameter | Value | Rationale |
|---|---|---|
batch_size |
32 | Standard DQN |
memory_size |
5,000 | Fits in RAM, sufficient for 30-trajectory regime |
gamma (discount) |
0.90 | Sub-episode horizons are ~10–50 steps |
huber_delta |
1.0 | Standard smooth L1 loss |
epsilon_start |
1.0 | Full exploration initially |
epsilon_min |
0.1 | Retain 10% exploration floor |
epsilon_decay |
0.99 | Per-episode decay |
target_update_freq |
10 | Episodes between target network sync |
L_MIN |
3 | Minimum segment length before CUT allowed |
CUT_PENALTY |
0.12 | Per-cut reward penalty |
EXTEND_COST |
0.01 | Per-extend small cost to break ties |
COMPLEXITY_LAMBDA |
0.03 | End-of-episode cut-rate regularizer |
4. SPSA Optimizer Configuration¶
Applied identically to VQ-DQN and all SPSA classical controls.
| Parameter | Value | Source |
|---|---|---|
| a (learning rate scale) | 0.12 | Spall 1998 defaults |
| c (perturbation scale) | 0.08 | Spall 1998 defaults |
| A (stability constant) | 20 | ~10% of expected iterations |
| alpha (LR decay rate) | 0.602 | Spall 1998 theory |
| gamma (pert decay rate) | 0.101 | Spall 1998 theory |
| momentum | 0.9 | m-SPSA variant |
| gradient clip | 1.0 | Prevents exploding updates |
5. Models Under Test¶
| Model | Kind | Params | Architecture |
|---|---|---|---|
| VQ-DQN (5q×3L) | quantum | 34 | 5-qubit, 3-layer HEA, angle encoding |
| MLP-34 (SPSA) | classical-SPSA | 34 | [4]-hidden MLP (param-matched) |
| MLP-34 (Adam) | classical-Adam | 34 | [4]-hidden MLP (param-matched) |
| Control A (linear) | classical-SPSA | 12 | Linear (no hidden) |
| Control B (h=64) | classical-SPSA | 450 | [64]-hidden MLP |
| Control C (h=32×32) | classical-SPSA | 1,314 | [32,32]-hidden MLP |
| Control D (Adam linear) | classical-Adam | 12 | Linear (no hidden) |
| Control E (Adam h=64) | classical-Adam | 450 | [64]-hidden MLP |
| Control F (Adam h=32×32) | classical-Adam | 1,314 | [32,32]-hidden MLP |
6. Evaluation Protocol¶
Training Definition¶
We define one epoch as a full pass over the scheduled training trajectories, each generating one episode per trajectory under ε-greedy exploration. With 27 training trajectories and 2 epochs, each model processes exactly 54 training episodes per seed.
Compute-Budget Parity¶
All SPSA-trained models receive identical:
- Environment steps: same trajectories, same step counts
- SPSA iterations: one update per batch_size replay samples
- Forward evaluations: SPSA = 2 evaluations per gradient estimate (θ+δ and θ-δ)
VQC vs MLP forward-pass cost differs, but this is irrelevant since we claim no speedup.
Validation¶
- Frequency: After every training epoch
- Policy: Greedy (ε = 0) —
agent.act(obs, greedy=True) - Metric: ValCR = OD / basesim, SSE, CUT%, segment count
- Multi-seed: 5 seeds (42, 123, 7, 99, 2025), report mean ± std
Best-Epoch Selection¶
- Criterion: Lowest ValCR across epochs (argmin over epoch index)
- Tie-breaking: Earlier epoch wins (first occurrence of minimum)
- Scope: Per-seed; aggregated across seeds via mean ± std of per-seed bests
Significance Testing¶
- Mann-Whitney U test (nonparametric, VQ-DQN vs each control)
- Cohen's d effect size with interpretation labels
- Bootstrap 95% confidence interval on mean difference (10,000 resamples)
7. Determinism & Reproducibility Guarantees¶
Seeded RNGs¶
Each experiment seeds the following at run start:
- np.random.seed(seed) — NumPy global RNG
- random.seed(seed) — Python stdlib RNG
- ReplayBuffer(seed=seed) — replay sampling
- TrajectoryScheduler(seed=seed) — train/val split and epoch ordering
Qiskit Determinism¶
Statevector simulation (shots=0) is fully deterministic — no sampling. Shot-based simulations (E3) use Qiskit's internal RNG seeded per circuit execution.
Run Identification¶
Each JSON output uniquely identifies a run via:
- args.seed / args.seeds — random seed(s)
- args.amount — dataset size
- args.epochs — training epochs
- args.experiments — experiment IDs
- env.git_hash — code version
- env.timestamp — wall-clock start time
- Per-model: model, kind, noise, shots, params
What Constitutes a "Run"¶
A run is uniquely identified by the tuple: (seed, dataset_amount, epochs, model_id, optimizer_kind, shots, noise_model, git_hash). Two runs with the same tuple produce identical results under statevector simulation.
8. Reproducing Results¶
# Clone and setup
git clone <repo>
cd q_rlstc
python -m venv .venv && source .venv/bin/activate
pip install -e .
# Full multi-seed E1 experiment (~90 min)
python experiments/run_thesis_experiments.py \
--experiments E1 \
--amount 30 --epochs 2 \
--seeds 42,123,7,99,2025 \
--output-dir results/thesis_multiseed
# Significance tests
python experiments/run_significance_test.py \
results/thesis_multiseed/thesis_results_*.json
# Robustness sweeps (shots, noise)
python experiments/run_thesis_experiments.py \
--experiments E2,E3 \
--amount 30 --epochs 2 \
--seeds 42,123,7,99,2025 \
--output-dir results/thesis_robustness
# Entanglement ablation
python experiments/run_thesis_experiments.py \
--experiments AB1 \
--amount 30 --epochs 2 \
--seeds 42,123,7,99,2025 \
--output-dir results/thesis_ablation
9. Metrics Glossary¶
| Metric | Formula | Interpretation |
|---|---|---|
| OD | mean(IED(segment, center)) | Average segment-to-center distance |
| CR (ValCR) | OD / basesim | Normalized quality (lower = better) |
| SSE | Σᵢ Σₛ∈Cᵢ IED(s, centerᵢ)² | Within-cluster compactness |
| CUT% | cuts / (cuts + extends) × 100 | Segmentation aggressiveness |
| Episodes-to-best | #episodes before best CR epoch | Sample efficiency (coarse) |
| Actions-to-best | #RL steps before best CR epoch | Sample efficiency (granular) |
| Q-margin | mean(Q_extend) - mean(Q_cut) | Policy preference direction |