Distance Estimation & Clustering¶
The Distance Problem¶
Trajectory clustering requires comparing trajectories to cluster centroids repeatedly. Each training episode performs hundreds of distance computations. Q-RLSTC addresses this via incremental classical distance with an optional quantum swap test for research validation.
Integrated Euclidean Distance (IED)¶
IED is the primary metric, inherited from classical RLSTC and now fully ported to Q-RLSTC in trajdistance.py.
Geometric Interpretation¶
IED computes the area between two trajectories over their temporal overlap — the integral of spatial distance across a common time interval.
For aligned segments with time interval [tₛ, tₑ]:
d₁ = distance between starting points
d₂ = distance between ending points
IED ≈ 0.5 × (d₁ + d₂) × (tₑ − tₛ) (trapezoidal approximation)
Why IED?¶
| Metric | Pros | Cons | Use Case | Q-RLSTC Module |
|---|---|---|---|---|
| IED | Fast incremental updates, handles different lengths | Requires temporal overlap | Real-time trajectory clustering | traj2traj_ied() |
| Fréchet | Shape-invariant, robust | O(n²) computation | Offline analysis | FrechetDistance.compute() |
| DTW | Handles speed variation | Expensive, not incremental | Time series alignment | DtwDistance.compute() |
| Euclidean | Simple, fast | Requires same length | Fixed-size vectors only | np.linalg.norm |
IED Functions — trajdistance.py¶
The full IED computation is ported from RLSTCcode with the following key functions:
| Function | Purpose |
|---|---|
traj2traj_ied(pts1, pts2) |
Full IED between two trajectory point sequences |
incremental_ied(traj1, traj2, k_dict, k, i, sp_i) |
Incremental IED update (O(1) per step) |
incremental_mindist(traj_pts, start, curr, k_dict, cluster_dict) |
Find nearest cluster using incremental IED |
line2line_ied(p1s, p1e, p2s, p2e) |
Segment-pair distance building block |
get_static_ied(points, x, y, t1, t2) |
Static point-to-trajectory IED |
timed_traj(points, ts, te) |
Extract time-windowed sub-trajectory |
traj_mdl_comp(points, start, curr, mode) |
MDL cost computation (for preprocessing) |
Incremental Distance (from RLSTC)¶
The key optimisation: when extending a segment by one point, only the new trapezoidal area is added. This reduces distance computation from O(n²) to O(1) per step.
State maintained per cluster in k_dict:
- mid_dist — Distance for the overlapping portion
- real_dist — Total distance including non-overlapping portions
- lastp — Last processed point in the cluster centre
- j — Current index into the cluster centre
OD Proxy (Q-RLSTC)¶
Q-RLSTC also supports a lightweight surrogate for per-step reward computation:
# Segment centroid-based running average
od_proxy = (current_od * n_segments + segment_cost) / (n_segments + 1)
This is intentionally less precise than full IED — the reward only needs a directional signal (is clustering improving?), not exact distance values.
K-Means Clustering¶
When used: Episode-end evaluation only — not per-step.
Defined in classical_kmeans.py. Standard Lloyd's algorithm with k-means++ initialisation:
class ClassicalKMeans:
def fit(self, data):
centroids = self._initialize_centroids(data) # k-means++
for _ in range(max_iter):
labels = self._assign_clusters(data, centroids)
new_centroids = self._update_centroids(data, labels)
if max_shift < threshold:
break
centroids = new_centroids
return KMeansResult(centroids, labels, objective)
Evaluation Metrics¶
Defined in metrics.py:
| Metric | Measures | Usage |
|---|---|---|
| Overall Distance (OD) | Sum of distances to cluster centroids | Primary quality metric |
| Silhouette Score | Cluster cohesion vs. separation | Independent of boundary accuracy |
| Segmentation F1 | Boundary detection accuracy | Against ground-truth boundaries |
Incremental Cluster Management¶
Ported from RLSTCcode's cluster.py to provide online cluster updates during RL training. Defined in the lower half of classical_kmeans.py:
| Function | Purpose |
|---|---|
add_to_cluster(cluster_dict, id, sub_traj, dist) |
Assign a sub-trajectory to a cluster incrementally |
compute_center(cluster_dict, id) |
Recompute cluster center from time-indexed points |
update_all_centers(cluster_dict) |
Update all centers + reset accumulators (end of round) |
compute_overdist(cluster_dict) |
Compute overall distance: sum(dists) / count(segments) |
initialize_cluster_dict(n_clusters, centers) |
Create empty cluster dict with optional pre-computed centers |
Cluster Dictionary Format¶
cluster_dict[k] = [
distances_list, # [0] per-segment IED distances
segment_trajs, # [1] list of sub-trajectory Trajectory objects
center_points, # [2] center trajectory points (List[Point])
time_indexed_points, # [3] defaultdict(list) — points indexed by time
]
Data Loading — pickle_loader.py¶
Loads RLSTCcode-format pickle files directly. Defined in pickle_loader.py:
| Function | Purpose |
|---|---|
load_trajectories(path) |
Load Tdrive_norm_traj → List[Trajectory] |
load_raw_trajectories(path) |
Load as raw RLSTCcode Traj objects (for classical arm) |
load_cluster_centers(path) |
Load tdrive_clustercenter → (cluster_dict, baseline_od) |
load_cluster_centers_raw(path) |
Load in MDP.py's native dict format |
load_subtrajectories(path) |
Load TRACLUS sub-trajectories |
load_test_set(path) |
Load held-out test/validation sets |
MDL Preprocessing — preprocessing.py¶
The TRACLUS-style MDL simplification pipeline, ported from RLSTCcode. Defined in preprocessing.py:
| Function | Purpose |
|---|---|
simplify_trajectory(traj) |
Greedy MDL-based trajectory simplification |
simplify_all(trajectories) |
Apply simplification to all trajectories |
preprocess_tdrive(raw, ...) |
Full pipeline: filter → normalize → simplify |
filter_by_coordinates(trajs) |
Geographic bounding box filter (Beijing) |
normalize_locations(trajs) |
Z-score normalize spatial coordinates |
normalize_time(trajs) |
Z-score normalize timestamps |
Quantum Distance via Swap Test (Optional)¶
The swap test estimates inner products between amplitude-encoded states. Used only as a research validation probe — not in the training loop.
Algorithm¶
- Prepare ancilla qubit in |0⟩
- Hadamard on ancilla → |+⟩
- Controlled-SWAP between data registers
- Hadamard on ancilla
- Measure: P(0) = 0.5 × (1 + |⟨ψ|φ⟩|²)
Cost Analysis¶
| Requirement | Cost |
|---|---|
| Amplitude encoding per segment | O(n) gates |
| Re-encoding when segment grows | Full circuit rebuild |
| Qubits | 2 × log₂(d) + 1 ancilla |
| Shots for ±0.01 precision | ≥ 4,096 |
| Circuit evals per episode | ~100 per trajectory |
Verdict: At ~100ms per circuit evaluation, quantum distance adds ~10 seconds per episode vs. <1ms classical. Acceptable for end-of-episode verification, impractical for per-step reward.