Skip to content

System Architecture


Three-Layer Hybrid Design

Q-RLSTC operates as a three-layer hybrid system. Each layer is assigned to classical or quantum execution based on algorithmic fit, hardware feasibility, and training-loop frequency.

┌──────────────────────────────────────────────────────────────────────┐
│ Layer 1: Environment & Distance Computation (CLASSICAL)              │
│   Trajectory → Incremental IED → 5D state observation                │
├──────────────────────────────────────────────────────────────────────┤
│ Layer 2: Policy Network (QUANTUM or CLASSICAL)                       │
│   State → Angle Encoding → 5q HEA (3L) → Z-Expectation → Q-values   │
│   OR: State → MLP (various sizes) → Q-values                        │
├──────────────────────────────────────────────────────────────────────┤
│ Layer 3: Clustering & Evaluation (CLASSICAL)                         │
│   Segments → Incremental center updates → ValCR evaluation           │
└──────────────────────────────────────────────────────────────────────┘

Data Flow

Raw Data (T-Drive / GeoLife pickle)
TrajRLclus.__init__()           ← Load trajectories + cluster centers
TrajRLclus.reset(episode)      ← Compute initial IED, 5D observation
Agent.act(observation)          ← VQ-DQN or MLP → Q-values → ε-greedy
    ├── Q(EXTEND) ← action 0
    └── Q(CUT)    ← action 1
TrajRLclus.step(action)        ← Incremental IED update, segment assignment
    ├── New observation (5D) → loop back to Agent.act()
    └── Segment → cluster_dict[k][0] (IED) + [1] (traj) + [4] (length)

Episode end:
compute_overdist(clusters_E)    ← Raw ValCR = mean(IED) / base_similarity
compute_overdist_per_point()    ← nValCR = mean(IED/len) / base_similarity
compute_overdist_length_weighted() ← wValCR = total_IED/total_pts / base

Design Philosophy

Hybrid First

Pure quantum solutions are not viable for NISQ. Q-RLSTC applies quantum computation only where it provides value — the policy network — keeping everything else classical. This is architecturally correct, not a compromise. See Justifications.

NISQ Awareness

Every circuit design decision prioritises noise resilience:

  • Shallow depth: 3 HEA layers (errors compound with depth)
  • Limited qubits: 5 qubits (fewer error sources)
  • Statistical averaging: configurable shot counts (128–4096)
  • Linear entanglement: fewer 2-qubit gates than ring or full connectivity

Modularity

Components are designed for independent testing and replacement:

  • The VQ-DQN can be swapped for a classical MLP (Controls A/B/C) with identical training pipeline
  • Noise models and shot counts are configurable via backends.py
  • All experiment hyperparameters are centralised in the PROTOCOL dict (run_thesis_experiments.py)
  • Distance module (rlstc_trajdistance.py) works identically with both systems

Agent Comparison

Agent Architecture Params Implementation
VQ-DQN 5q × 3L HEA + affine head 34 vqdqn_agent.py
Control A 5→2 linear 12 spsa_classical_agent.py
Control B 5→64→2 MLP 514 spsa_classical_agent.py
Control C 5→32→32→2 MLP 1,314 spsa_classical_agent.py

All agents share: SPSA optimizer, Double DQN, experience replay (5000), Huber loss, Q-value clamping (±10), TD target clamping (±10).

Quantum Scope Boundary

Component Implementation Rationale
Q-value estimation Quantum (VQ-DQN) Empirically parameter-efficient; clean 5→2 mapping
State encoding Quantum (Angle) Bounded features → rotation angles
Distance computation Classical (IED) Incremental O(1) updates; quantum would require full re-encoding
Clustering Classical Incremental center updates; no quantum centroid algorithm exists
Reward computation Classical Single floating-point arithmetic in experiment runner
Data loading Classical Pickle I/O

Next: MDP & Reward Engineering →