System Architecture¶
Three-Layer Hybrid Design¶
Q-RLSTC operates as a three-layer hybrid system. Each layer is assigned to classical or quantum execution based on algorithmic fit, hardware feasibility, and training-loop frequency.
┌──────────────────────────────────────────────────────────────────────┐
│ Layer 1: Environment & Distance Computation (CLASSICAL) │
│ Trajectory → Incremental IED → 5D state observation │
├──────────────────────────────────────────────────────────────────────┤
│ Layer 2: Policy Network (QUANTUM or CLASSICAL) │
│ State → Angle Encoding → 5q HEA (3L) → Z-Expectation → Q-values │
│ OR: State → MLP (various sizes) → Q-values │
├──────────────────────────────────────────────────────────────────────┤
│ Layer 3: Clustering & Evaluation (CLASSICAL) │
│ Segments → Incremental center updates → ValCR evaluation │
└──────────────────────────────────────────────────────────────────────┘
Data Flow¶
Raw Data (T-Drive / GeoLife pickle)
│
▼
TrajRLclus.__init__() ← Load trajectories + cluster centers
│
▼
TrajRLclus.reset(episode) ← Compute initial IED, 5D observation
│
▼
Agent.act(observation) ← VQ-DQN or MLP → Q-values → ε-greedy
│
├── Q(EXTEND) ← action 0
└── Q(CUT) ← action 1
│
▼
TrajRLclus.step(action) ← Incremental IED update, segment assignment
│
├── New observation (5D) → loop back to Agent.act()
└── Segment → cluster_dict[k][0] (IED) + [1] (traj) + [4] (length)
Episode end:
│
▼
compute_overdist(clusters_E) ← Raw ValCR = mean(IED) / base_similarity
compute_overdist_per_point() ← nValCR = mean(IED/len) / base_similarity
compute_overdist_length_weighted() ← wValCR = total_IED/total_pts / base
Design Philosophy¶
Hybrid First¶
Pure quantum solutions are not viable for NISQ. Q-RLSTC applies quantum computation only where it provides value — the policy network — keeping everything else classical. This is architecturally correct, not a compromise. See Justifications.
NISQ Awareness¶
Every circuit design decision prioritises noise resilience:
- Shallow depth: 3 HEA layers (errors compound with depth)
- Limited qubits: 5 qubits (fewer error sources)
- Statistical averaging: configurable shot counts (128–4096)
- Linear entanglement: fewer 2-qubit gates than ring or full connectivity
Modularity¶
Components are designed for independent testing and replacement:
- The VQ-DQN can be swapped for a classical MLP (Controls A/B/C) with identical training pipeline
- Noise models and shot counts are configurable via
backends.py - All experiment hyperparameters are centralised in the PROTOCOL dict (
run_thesis_experiments.py) - Distance module (
rlstc_trajdistance.py) works identically with both systems
Agent Comparison¶
| Agent | Architecture | Params | Implementation |
|---|---|---|---|
| VQ-DQN | 5q × 3L HEA + affine head | 34 | vqdqn_agent.py |
| Control A | 5→2 linear | 12 | spsa_classical_agent.py |
| Control B | 5→64→2 MLP | 514 | spsa_classical_agent.py |
| Control C | 5→32→32→2 MLP | 1,314 | spsa_classical_agent.py |
All agents share: SPSA optimizer, Double DQN, experience replay (5000), Huber loss, Q-value clamping (±10), TD target clamping (±10).
Quantum Scope Boundary¶
| Component | Implementation | Rationale |
|---|---|---|
| Q-value estimation | Quantum (VQ-DQN) | Empirically parameter-efficient; clean 5→2 mapping |
| State encoding | Quantum (Angle) | Bounded features → rotation angles |
| Distance computation | Classical (IED) | Incremental O(1) updates; quantum would require full re-encoding |
| Clustering | Classical | Incremental center updates; no quantum centroid algorithm exists |
| Reward computation | Classical | Single floating-point arithmetic in experiment runner |
| Data loading | Classical | Pickle I/O |