RLSTC vs. Q-RLSTC: Technical Comparison¶

A side-by-side analysis across 13 dimensions, covering architecture, design decisions, and all four Q-RLSTC versions (A, B, C, D).

1. Architecture Overview¶

[!IMPORTANT] The "RLSTC" column below describes the original RLSTC paper's architecture (SGD, soft-update, single DQN). For controlled experiments, the classical MLP baselines intentionally mirror Q-RLSTC's training setup (SPSA, hard-copy, Double DQN) so that the function approximator is the only independent variable. See Experimental Design for the controlled comparison specification.

Dimension	RLSTC (Original Paper)	Q-RLSTC (This Implementation)
Policy network	Classical DQN (TF 1.x / Keras)	VQ-DQN (Qiskit parameterised circuit)
Optimizer	SGD (lr = 0.001)	SPSA / m-SPSA (gradient-free, NISQ-suitable)
Distance / clustering	Incremental IED (custom)	IED (ported) + classical k-means + incremental OD proxy
Loss function	Huber loss	Huber loss (same)
Target network	Soft-update (τ = 0.05)	Periodic hard copy (`target_update_freq`)
Double DQN	No	Yes

2. State Representation¶

Classical RLSTC — 5 Features¶

#	Feature	Source
0	`overall_sim` — OD to nearest cluster centre	`MDP.py`
1	`min_sim` — Minimum per-point similarity	`MDP.py`
2	`segment_len` — Current segment point count	`MDP.py`
3	`traj_progress` — Fraction consumed	`MDP.py`
4	`seg_count` — Segments created so far	`MDP.py`

Q-RLSTC Version A — 5 Features¶

#	Feature	How it differs from RLSTC
0	`od_segment` — Projected OD if we split	Proxy-based, not full IED
1	`od_continue` — Projected OD if we extend	Running average, not global recalc
2	`baseline_cost` — MDL compression score	TRACLUS-inspired, replaces `min_sim`
3	`len_backward` — Normalised segment length	Same concept, different normalisation
4	`len_forward` — Remaining trajectory	Same concept

Q-RLSTC Version B — 8 Features¶

Adds three quantum-native features to Version A:

#	Feature	Rationale
5	`angle_spread` — Variance of arctan-encoded features	Bloch sphere spread
6	`curvature_gradient` — Rate of curvature change	2nd-order geometric signal
7	`segment_density` — Points per unit distance	Congestion without explicit speed

Q-RLSTC Version C — 5D + Memory¶

Same 5 features as Version A, plus a shadow qubit (qubit 0) that persists quantum state across time steps, creating recurrent memory without additional classical features.

Q-RLSTC Version D — 5 Features (VLDB Exact)¶

#	Feature	Source
0	`OD_s` — OD if we CUT here	Equation (19) of VLDB paper
1	`OD_n` — OD if we EXTEND	Same
2	`OD_b` — TRACLUS expert baseline	Ablation-confirmed improvement
3	`L_b` — Normalised backward segment length	Same
4	`L_f` — Normalised forward remaining length	Same

3. Action Space¶

Version	Actions	Description
A, B	2	EXTEND (0) or CUT (1)
C	3	EXTEND (0), CUT (1), DROP (2) — actively filters noise
D	2–3	EXTEND, CUT, optional SKIP(S) that fast-forwards S points

4. Reward Functions¶

Component	RLSTC	Q-RLSTC (A/B)	Q-RLSTC (C)	Q-RLSTC (D)
Main signal	`ΔOD = last_od − current_od` (full IED)	`α · od_improvement` (proxy)	Same + DROP penalty	`OD(s_t) − OD(s_{t+1})` (paper exact)
Boundary quality	None	`β · boundary_sharpness`	Same	None (paper doesn't use it)
Over-segmentation	Implicit via `MIN_SEGMENT_LEN`	Explicit `−penalty` in reward	Same + DROP micro-penalty (−0.05)	Implicit
SKIP reward	N/A	N/A	N/A	+0.05 × S (linear, low-var segments)
Markov safety	Depends on global cluster state	Uses only incremental quantities	Same	Same

5. Quantum Circuit¶

Aspect	Version A	Version B	Version C	Version D
Qubits	5	8	6 (5+1 shadow)	5
Encoding	Angle (RY)	Angle (RY)	Angle (RY)	Angle (RY)
Ansatz	HEA (RY-RZ + linear CNOT)	HEA	EQC (RZ + circular CNOT)	HEA (3 layers)
Variational layers	2	2	2	3
Trainable params	20	32	~24	30
Entanglement	4 CNOTs (linear)	7 CNOTs (linear)	6 CNOTs (circular)	4 CNOTs (linear)
Data re-uploading	Yes	Yes	Yes	Yes
Readout	⟨Z₀⟩, ⟨Z₁⟩	w·⟨Z⟩ + w·⟨ZZ⟩	Softmax π(a\|s)	⟨Z₀⟩, ⟨Z₁⟩, ⟨Z₂⟩

6. Optimizer¶

	RLSTC	Q-RLSTC (A/B/D)	Q-RLSTC (C)
Method	SGD with backprop	SPSA (gradient-free)	m-SPSA (momentum-averaged)
Evals per step	1 (forward + backward)	2 (forward only)	2 + EMA smoothing
Shot noise?	N/A	Robust by design	Extra-robust via momentum
Gradient clipping	No	Yes (max norm 1.0)	Yes

7. Distance Computation¶

	RLSTC	Q-RLSTC
Primary metric	Incremental IED	IED (ported in `trajdistance.py`) + OD proxy
Per-step cost	O(1) amortised	O(1)
Full computation	Every CUT action	Episode-end (k-means) or incremental update
Available metrics	IED, Fréchet, DTW	IED, Fréchet, DTW, OD, Silhouette, F1
Incremental updates	`cluster.py`	`classical_kmeans.py` (ported)

8. Data Structures¶

	RLSTC	Q-RLSTC
Point	Plain class with `x, y, t`	`@dataclass` with `distance()`, `to_array()`
Segment	Class with distance methods	Implicit (index range)
Trajectory	`Traj(points, size, ts, te)`	`@dataclass` with `boundaries`, `labels`
Replay buffer	`deque(maxlen=2000)` in DQN class	Separate `ReplayBuffer(5000)`
Cluster state	Mutable dict `{id: [data]}`	Same format (ported), `@dataclass ClusterState`
Config	Hardcoded constants	Nested `@dataclass` hierarchy

9. Version Comparison Summary¶

Dimension	A (Classical Parity)	B (Quantum Enhanced)	C (Next-Gen Q-RNN)	D (VLDB Aligned)
Goal	Isolate quantum vs. classical	Explore parameter efficiency	Full quantum-native architecture	Strict VLDB paper reproduction
Qubits	5	8	6	5
Features	5D (matches RLSTC)	8D (3 quantum-native)	5D + shadow memory	5D (VLDB exact)
Readout	Single-qubit Z	Multi-observable (Z + ZZ)	Softmax distribution	Multi-qubit Z
Params	20	32	~24	30
Actions	2	2	3 (+ DROP)	2–3 (+ opt. SKIP)
Agent	ε-greedy DQN	ε-greedy DQN	SAC	ε-greedy DQN
Optimizer	SPSA	SPSA	m-SPSA	SPSA
Shots	Fixed (512/4096)	Fixed	Adaptive (32→512)	Fixed
Config	`version="A"`	`version="B"`	`version="C"`	`version="D"`

10. Noise & Hardware¶

	RLSTC	Q-RLSTC
Noise simulation	None	Full stack (ideal, simple, Eagle, Heron)
Error mitigation	None	Readout calibration matrix
Backend	CPU (TensorFlow)	Qiskit Aer (configurable)

11. Training Pipeline¶

Aspect	RLSTC	Q-RLSTC (A/B/D)	Q-RLSTC (C)
Loop	Iterate points → EXTEND/CUT	Same	Same + DROP/SKIP
Replay	Internal to DQN (2,000)	Separate buffer (5,000)	Same
Target update	Soft (τ = 0.05 every batch)	Hard copy every N episodes	Same
Double DQN	No	Yes	Yes
Anti-gaming	`MIN_SEGMENT_LEN`	Same + explicit reward penalty	Same + DROP penalty

12. Design Rationale¶

Decision	Rationale
Only policy is quantum	Fixed I/O (5→2); distance needs O(1) updates
SPSA over parameter-shift	2 evals vs. 40; scales to larger circuits
Angle encoding	1 feature → 1 qubit; bounded via `arctan`
Data re-uploading	Expressivity without depth; proven technique
Version A exists	Scientific control: isolate the approximator
Version B exists	Explore whether more qubits + richer features helps
Version C exists	Full quantum-native: shadow memory, EQC, SAC, adaptive shots
Version D exists	VLDB paper reproduction: exact MDP → VQC substitution
IED ported to Q-RLSTC	Classical parity: identical distance metric for fair comparison
MDL simplification ported	Ensures identical preprocessing between systems
Pickle loader	Direct data sharing between RLSTCcode and Q-RLSTC

13. File Reference¶

Classical RLSTC¶

File	Purpose
`rl_nn.py`	DQN: model, training, target network
`MDP.py`	Environment: state features, reward, step logic
`rl_train.py`	Training loop orchestration
`rl_estimate.py`	Evaluation / inference
`cluster.py`	Incremental IED clustering
`trajdistance.py`	IED, Fréchet, DTW distances
`segment.py`	Segment distance metrics
`point.py`	Point data structure
`preprocessing.py`	MDL simplification, normalisation

Q-RLSTC¶

File	Purpose
`quantum/vqdqn_circuit.py`	Circuit: encoding, HEA, measurement
`rl/vqdqn_agent.py`	Agent: ε-greedy, Double DQN, target network
`rl/train.py`	Training loop + MDP environment
`rl/spsa.py`	SPSA optimizer
`rl/replay_buffer.py`	Experience replay
`config.py`	Configuration dataclasses (A/B/C/D)
`data/features.py`	State feature extraction (A, B, D)
`data/preprocessing.py`	MDL simplification + TRACLUS pipeline
`data/synthetic.py`	Trajectory generation
`clustering/classical_kmeans.py`	K-means + incremental cluster updates
`clustering/metrics.py`	OD, silhouette, F1
`clustering/trajdistance.py`	IED, Fréchet, DTW (ported from RLSTC)
`clustering/pickle_loader.py`	Load RLSTCcode pickle data files
`quantum/backends.py`	Noise models (ideal, Eagle, Heron)
`quantum/mitigation.py`	Readout error mitigation
`experiments/run_cross_comparison.py`	Classical ↔ quantum comparison runner
`experiments/data_bridge.py`	RLSTCcode → Q-RLSTC data conversion

Next: Noise & Hardware Simulation →