RLSTC vs. Q-RLSTC: Technical Comparison
A side-by-side analysis across 13 dimensions, covering architecture, design decisions, and all four Q-RLSTC versions (A, B, C, D).
1. Architecture Overview
[!IMPORTANT]
The "RLSTC" column below describes the original RLSTC paper's architecture (SGD, soft-update, single DQN). For controlled experiments, the classical MLP baselines intentionally mirror Q-RLSTC's training setup (SPSA, hard-copy, Double DQN) so that the function approximator is the only independent variable. See Experimental Design for the controlled comparison specification.
| Dimension |
RLSTC (Original Paper) |
Q-RLSTC (This Implementation) |
| Policy network |
Classical DQN (TF 1.x / Keras) |
VQ-DQN (Qiskit parameterised circuit) |
| Optimizer |
SGD (lr = 0.001) |
SPSA / m-SPSA (gradient-free, NISQ-suitable) |
| Distance / clustering |
Incremental IED (custom) |
IED (ported) + classical k-means + incremental OD proxy |
| Loss function |
Huber loss |
Huber loss (same) |
| Target network |
Soft-update (τ = 0.05) |
Periodic hard copy (target_update_freq) |
| Double DQN |
No |
Yes |
2. State Representation
Classical RLSTC — 5 Features
| # |
Feature |
Source |
| 0 |
overall_sim — OD to nearest cluster centre |
MDP.py |
| 1 |
min_sim — Minimum per-point similarity |
MDP.py |
| 2 |
segment_len — Current segment point count |
MDP.py |
| 3 |
traj_progress — Fraction consumed |
MDP.py |
| 4 |
seg_count — Segments created so far |
MDP.py |
Q-RLSTC Version A — 5 Features
| # |
Feature |
How it differs from RLSTC |
| 0 |
od_segment — Projected OD if we split |
Proxy-based, not full IED |
| 1 |
od_continue — Projected OD if we extend |
Running average, not global recalc |
| 2 |
baseline_cost — MDL compression score |
TRACLUS-inspired, replaces min_sim |
| 3 |
len_backward — Normalised segment length |
Same concept, different normalisation |
| 4 |
len_forward — Remaining trajectory |
Same concept |
Q-RLSTC Version B — 8 Features
Adds three quantum-native features to Version A:
| # |
Feature |
Rationale |
| 5 |
angle_spread — Variance of arctan-encoded features |
Bloch sphere spread |
| 6 |
curvature_gradient — Rate of curvature change |
2nd-order geometric signal |
| 7 |
segment_density — Points per unit distance |
Congestion without explicit speed |
Q-RLSTC Version C — 5D + Memory
Same 5 features as Version A, plus a shadow qubit (qubit 0) that persists quantum state across time steps, creating recurrent memory without additional classical features.
Q-RLSTC Version D — 5 Features (VLDB Exact)
| # |
Feature |
Source |
| 0 |
OD_s — OD if we CUT here |
Equation (19) of VLDB paper |
| 1 |
OD_n — OD if we EXTEND |
Same |
| 2 |
OD_b — TRACLUS expert baseline |
Ablation-confirmed improvement |
| 3 |
L_b — Normalised backward segment length |
Same |
| 4 |
L_f — Normalised forward remaining length |
Same |
3. Action Space
| Version |
Actions |
Description |
| A, B |
2 |
EXTEND (0) or CUT (1) |
| C |
3 |
EXTEND (0), CUT (1), DROP (2) — actively filters noise |
| D |
2–3 |
EXTEND, CUT, optional SKIP(S) that fast-forwards S points |
4. Reward Functions
| Component |
RLSTC |
Q-RLSTC (A/B) |
Q-RLSTC (C) |
Q-RLSTC (D) |
| Main signal |
ΔOD = last_od − current_od (full IED) |
α · od_improvement (proxy) |
Same + DROP penalty |
OD(s_t) − OD(s_{t+1}) (paper exact) |
| Boundary quality |
None |
β · boundary_sharpness |
Same |
None (paper doesn't use it) |
| Over-segmentation |
Implicit via MIN_SEGMENT_LEN |
Explicit −penalty in reward |
Same + DROP micro-penalty (−0.05) |
Implicit |
| SKIP reward |
N/A |
N/A |
N/A |
+0.05 × S (linear, low-var segments) |
| Markov safety |
Depends on global cluster state |
Uses only incremental quantities |
Same |
Same |
5. Quantum Circuit
| Aspect |
Version A |
Version B |
Version C |
Version D |
| Qubits |
5 |
8 |
6 (5+1 shadow) |
5 |
| Encoding |
Angle (RY) |
Angle (RY) |
Angle (RY) |
Angle (RY) |
| Ansatz |
HEA (RY-RZ + linear CNOT) |
HEA |
EQC (RZ + circular CNOT) |
HEA (3 layers) |
| Variational layers |
2 |
2 |
2 |
3 |
| Trainable params |
20 |
32 |
~24 |
30 |
| Entanglement |
4 CNOTs (linear) |
7 CNOTs (linear) |
6 CNOTs (circular) |
4 CNOTs (linear) |
| Data re-uploading |
Yes |
Yes |
Yes |
Yes |
| Readout |
⟨Z₀⟩, ⟨Z₁⟩ |
w·⟨Z⟩ + w·⟨ZZ⟩ |
Softmax π(a|s) |
⟨Z₀⟩, ⟨Z₁⟩, ⟨Z₂⟩ |
6. Optimizer
|
RLSTC |
Q-RLSTC (A/B/D) |
Q-RLSTC (C) |
| Method |
SGD with backprop |
SPSA (gradient-free) |
m-SPSA (momentum-averaged) |
| Evals per step |
1 (forward + backward) |
2 (forward only) |
2 + EMA smoothing |
| Shot noise? |
N/A |
Robust by design |
Extra-robust via momentum |
| Gradient clipping |
No |
Yes (max norm 1.0) |
Yes |
7. Distance Computation
|
RLSTC |
Q-RLSTC |
| Primary metric |
Incremental IED |
IED (ported in trajdistance.py) + OD proxy |
| Per-step cost |
O(1) amortised |
O(1) |
| Full computation |
Every CUT action |
Episode-end (k-means) or incremental update |
| Available metrics |
IED, Fréchet, DTW |
IED, Fréchet, DTW, OD, Silhouette, F1 |
| Incremental updates |
cluster.py |
classical_kmeans.py (ported) |
8. Data Structures
|
RLSTC |
Q-RLSTC |
| Point |
Plain class with x, y, t |
@dataclass with distance(), to_array() |
| Segment |
Class with distance methods |
Implicit (index range) |
| Trajectory |
Traj(points, size, ts, te) |
@dataclass with boundaries, labels |
| Replay buffer |
deque(maxlen=2000) in DQN class |
Separate ReplayBuffer(5000) |
| Cluster state |
Mutable dict {id: [data]} |
Same format (ported), @dataclass ClusterState |
| Config |
Hardcoded constants |
Nested @dataclass hierarchy |
9. Version Comparison Summary
| Dimension |
A (Classical Parity) |
B (Quantum Enhanced) |
C (Next-Gen Q-RNN) |
D (VLDB Aligned) |
| Goal |
Isolate quantum vs. classical |
Explore parameter efficiency |
Full quantum-native architecture |
Strict VLDB paper reproduction |
| Qubits |
5 |
8 |
6 |
5 |
| Features |
5D (matches RLSTC) |
8D (3 quantum-native) |
5D + shadow memory |
5D (VLDB exact) |
| Readout |
Single-qubit Z |
Multi-observable (Z + ZZ) |
Softmax distribution |
Multi-qubit Z |
| Params |
20 |
32 |
~24 |
30 |
| Actions |
2 |
2 |
3 (+ DROP) |
2–3 (+ opt. SKIP) |
| Agent |
ε-greedy DQN |
ε-greedy DQN |
SAC |
ε-greedy DQN |
| Optimizer |
SPSA |
SPSA |
m-SPSA |
SPSA |
| Shots |
Fixed (512/4096) |
Fixed |
Adaptive (32→512) |
Fixed |
| Config |
version="A" |
version="B" |
version="C" |
version="D" |
10. Noise & Hardware
|
RLSTC |
Q-RLSTC |
| Noise simulation |
None |
Full stack (ideal, simple, Eagle, Heron) |
| Error mitigation |
None |
Readout calibration matrix |
| Backend |
CPU (TensorFlow) |
Qiskit Aer (configurable) |
11. Training Pipeline
| Aspect |
RLSTC |
Q-RLSTC (A/B/D) |
Q-RLSTC (C) |
| Loop |
Iterate points → EXTEND/CUT |
Same |
Same + DROP/SKIP |
| Replay |
Internal to DQN (2,000) |
Separate buffer (5,000) |
Same |
| Target update |
Soft (τ = 0.05 every batch) |
Hard copy every N episodes |
Same |
| Double DQN |
No |
Yes |
Yes |
| Anti-gaming |
MIN_SEGMENT_LEN |
Same + explicit reward penalty |
Same + DROP penalty |
12. Design Rationale
| Decision |
Rationale |
| Only policy is quantum |
Fixed I/O (5→2); distance needs O(1) updates |
| SPSA over parameter-shift |
2 evals vs. 40; scales to larger circuits |
| Angle encoding |
1 feature → 1 qubit; bounded via arctan |
| Data re-uploading |
Expressivity without depth; proven technique |
| Version A exists |
Scientific control: isolate the approximator |
| Version B exists |
Explore whether more qubits + richer features helps |
| Version C exists |
Full quantum-native: shadow memory, EQC, SAC, adaptive shots |
| Version D exists |
VLDB paper reproduction: exact MDP → VQC substitution |
| IED ported to Q-RLSTC |
Classical parity: identical distance metric for fair comparison |
| MDL simplification ported |
Ensures identical preprocessing between systems |
| Pickle loader |
Direct data sharing between RLSTCcode and Q-RLSTC |
13. File Reference
Classical RLSTC
| File |
Purpose |
rl_nn.py |
DQN: model, training, target network |
MDP.py |
Environment: state features, reward, step logic |
rl_train.py |
Training loop orchestration |
rl_estimate.py |
Evaluation / inference |
cluster.py |
Incremental IED clustering |
trajdistance.py |
IED, Fréchet, DTW distances |
segment.py |
Segment distance metrics |
point.py |
Point data structure |
preprocessing.py |
MDL simplification, normalisation |
Q-RLSTC
| File |
Purpose |
quantum/vqdqn_circuit.py |
Circuit: encoding, HEA, measurement |
rl/vqdqn_agent.py |
Agent: ε-greedy, Double DQN, target network |
rl/train.py |
Training loop + MDP environment |
rl/spsa.py |
SPSA optimizer |
rl/replay_buffer.py |
Experience replay |
config.py |
Configuration dataclasses (A/B/C/D) |
data/features.py |
State feature extraction (A, B, D) |
data/preprocessing.py |
MDL simplification + TRACLUS pipeline |
data/synthetic.py |
Trajectory generation |
clustering/classical_kmeans.py |
K-means + incremental cluster updates |
clustering/metrics.py |
OD, silhouette, F1 |
clustering/trajdistance.py |
IED, Fréchet, DTW (ported from RLSTC) |
clustering/pickle_loader.py |
Load RLSTCcode pickle data files |
quantum/backends.py |
Noise models (ideal, Eagle, Heron) |
quantum/mitigation.py |
Readout error mitigation |
experiments/run_cross_comparison.py |
Classical ↔ quantum comparison runner |
experiments/data_bridge.py |
RLSTCcode → Q-RLSTC data conversion |
Next: Noise & Hardware Simulation →