Debugging Guide¶
Common Failure Modes¶
1. Q-Values Stuck at Zero¶
Symptoms: Both Q(EXTEND) and Q(CUT) return ~0 for all states.
Causes: - Variational parameters initialised symmetrically → no gradient signal - Learning rate too low → SPSA steps too small to escape - Shot count too low → expectations drowned in noise
Fix:
# Randomise initial parameters
params = np.random.uniform(-np.pi, np.pi, size=n_params)
# Increase initial learning rate
spsa_config.a = 0.2 # Default 0.12
# Verify expectation values manually
circuit = builder.build(state, params)
counts = backend.run(circuit, shots=4096).result().get_counts()
print(compute_expectation_from_counts(counts, 4096, qubit=0, n_qubits=5))
2. Agent Always Extends (Never Cuts)¶
Symptoms: 0-1 segments per trajectory. F1 near zero.
Causes: - Reward for EXTEND > reward for CUT (penalty too high) - ε-greedy exploration exhausted before learning CUT value - MIN_SEGMENT_LEN too large relative to trajectory
Fix:
- Reduce SEGMENT_PENALTY (e.g., 0.1 → 0.01)
- Slower epsilon decay: epsilon_decay = 0.995
- Check that boundary sharpness is being computed correctly
3. Agent Always Cuts (Over-Segmentation)¶
Symptoms: 50+ segments per trajectory. High OD.
Causes:
- Boundary sharpness reward dominates OD improvement
- No minimum segment length enforcement
- MAX_SEGMENTS not enforced
Fix:
- Verify MIN_SEGMENT_LEN constraint is triggering
- Increase SEGMENT_PENALTY or decrease beta (sharpness weight)
- Add logging: if action == CUT: print(f"seg_len={seg_len}, sharpness={sharpness}")
4. Training Loss Not Decreasing¶
Symptoms: TD loss oscillates without improvement.
Causes: - SPSA perturbation too large → estimates are random - Target network stale → Q-targets are wrong - Replay buffer too small → overfitting recent experiences
Fix:
# Smaller perturbation
spsa_config.c = 0.05
# More frequent target updates
rl_config.target_update_freq = 5
# Larger buffer
rl_config.memory_size = 20_000
5. Noise Crashes Training (Noisy Backend)¶
Symptoms: Works on ideal, diverges on Eagle/Heron.
Causes: - Readout errors systematically bias Q-values - Decoherence reduces circuit fidelity below usable threshold - SPSA gradient estimates too noisy
Fix:
- Enable readout mitigation: noise_config.use_mitigation = True
- Increase shots: 512 → 2048
- Reduce circuit depth (fewer layers)
- Try a less aggressive noise model first (simple → Eagle)
Diagnostic Functions¶
Circuit Inspection¶
from q_rlstc.quantum.vqdqn_circuit import VQDQNCircuitBuilder
builder = VQDQNCircuitBuilder(n_qubits=5, n_layers=2)
state = np.array([0.5, 0.3, -0.2, 0.1, 0.8])
params = np.random.uniform(-np.pi, np.pi, size=20)
circuit = builder.build(state, params)
print(circuit.draw()) # Text diagram
print(f"Depth: {circuit.depth()}")
print(f"Gates: {circuit.count_ops()}")
Expectation Value Sanity Check¶
# All-zero state should give known expectations
state = np.zeros(5)
params = np.zeros(20)
circuit = builder.build(state, params)
# With zero params and zero state: ⟨Z⟩ should be +1 for all qubits
Replay Buffer Inspection¶
buffer = replay_buffer
recent = buffer.buffer[-5:]
for exp in recent:
print(f"s={exp.state}, a={exp.action}, r={exp.reward:.4f}, done={exp.done}")
Extension Points¶
Adding a New Feature (Version B)¶
- Add computation in
StateFeatureExtractorB._compute - Update
n_features→ increase qubit count in config - Update
VQDQNConfig.n_qubitsandn_params - Re-run circuit depth analysis
Adding a New Noise Model¶
- Define in
backends.py → get_noise_model() - Add gate errors, thermal relaxation, readout errors
- Test with
pytest tests/test_noise_models.py
Swapping the Optimizer¶
- Implement
step(params, loss_fn)interface - Replace
SPSAOptimizerinVQDQNAgent.__init__ - Ensure gradient estimation is compatible with shot-based evaluation
Adding a New Distance Metric¶
- Implement in
clustering/metrics.py - Use in
MDPEnvironment._compute_reward_components() - Update state features if the metric provides useful state signal
Next: API Reference →