Distance Estimation & Clustering¶

The Distance Problem¶

Trajectory clustering requires comparing trajectories to cluster centroids repeatedly. Each training episode performs hundreds of distance computations. Q-RLSTC addresses this via incremental classical distance with an optional quantum swap test for research validation.

Integrated Euclidean Distance (IED)¶

IED is the primary metric, inherited from classical RLSTC and now fully ported to Q-RLSTC in trajdistance.py.

Geometric Interpretation¶

IED computes the area between two trajectories over their temporal overlap — the integral of spatial distance across a common time interval.

For aligned segments with time interval [tₛ, tₑ]:
    d₁ = distance between starting points
    d₂ = distance between ending points
    IED ≈ 0.5 × (d₁ + d₂) × (tₑ − tₛ)     (trapezoidal approximation)

Why IED?¶

Metric	Pros	Cons	Use Case	Q-RLSTC Module
IED	Fast incremental updates, handles different lengths	Requires temporal overlap	Real-time trajectory clustering	`traj2traj_ied()`
Fréchet	Shape-invariant, robust	O(n²) computation	Offline analysis	`FrechetDistance.compute()`
DTW	Handles speed variation	Expensive, not incremental	Time series alignment	`DtwDistance.compute()`
Euclidean	Simple, fast	Requires same length	Fixed-size vectors only	`np.linalg.norm`

IED Functions — `trajdistance.py`¶

The full IED computation is ported from RLSTCcode with the following key functions:

Function	Purpose
`traj2traj_ied(pts1, pts2)`	Full IED between two trajectory point sequences
`incremental_ied(traj1, traj2, k_dict, k, i, sp_i)`	Incremental IED update (O(1) per step)
`incremental_mindist(traj_pts, start, curr, k_dict, cluster_dict)`	Find nearest cluster using incremental IED
`line2line_ied(p1s, p1e, p2s, p2e)`	Segment-pair distance building block
`get_static_ied(points, x, y, t1, t2)`	Static point-to-trajectory IED
`timed_traj(points, ts, te)`	Extract time-windowed sub-trajectory
`traj_mdl_comp(points, start, curr, mode)`	MDL cost computation (for preprocessing)

Incremental Distance (from RLSTC)¶

The key optimisation: when extending a segment by one point, only the new trapezoidal area is added. This reduces distance computation from O(n²) to O(1) per step.

State maintained per cluster in k_dict: - mid_dist — Distance for the overlapping portion - real_dist — Total distance including non-overlapping portions - lastp — Last processed point in the cluster centre - j — Current index into the cluster centre

OD Proxy (Q-RLSTC)¶

Q-RLSTC also supports a lightweight surrogate for per-step reward computation:

# Segment centroid-based running average
od_proxy = (current_od * n_segments + segment_cost) / (n_segments + 1)

This is intentionally less precise than full IED — the reward only needs a directional signal (is clustering improving?), not exact distance values.

K-Means Clustering¶

When used: Episode-end evaluation only — not per-step.

Defined in classical_kmeans.py. Standard Lloyd's algorithm with k-means++ initialisation:

class ClassicalKMeans:
    def fit(self, data):
        centroids = self._initialize_centroids(data)     # k-means++
        for _ in range(max_iter):
            labels = self._assign_clusters(data, centroids)
            new_centroids = self._update_centroids(data, labels)
            if max_shift < threshold:
                break
            centroids = new_centroids
        return KMeansResult(centroids, labels, objective)

Evaluation Metrics¶

Defined in metrics.py:

Metric	Measures	Usage
Overall Distance (OD)	Sum of distances to cluster centroids	Primary quality metric
Silhouette Score	Cluster cohesion vs. separation	Independent of boundary accuracy
Segmentation F1	Boundary detection accuracy	Against ground-truth boundaries

Incremental Cluster Management¶

Ported from RLSTCcode's cluster.py to provide online cluster updates during RL training. Defined in the lower half of classical_kmeans.py:

Function	Purpose
`add_to_cluster(cluster_dict, id, sub_traj, dist)`	Assign a sub-trajectory to a cluster incrementally
`compute_center(cluster_dict, id)`	Recompute cluster center from time-indexed points
`update_all_centers(cluster_dict)`	Update all centers + reset accumulators (end of round)
`compute_overdist(cluster_dict)`	Compute overall distance: `sum(dists) / count(segments)`
`initialize_cluster_dict(n_clusters, centers)`	Create empty cluster dict with optional pre-computed centers

Cluster Dictionary Format¶

cluster_dict[k] = [
    distances_list,       # [0] per-segment IED distances
    segment_trajs,        # [1] list of sub-trajectory Trajectory objects
    center_points,        # [2] center trajectory points (List[Point])
    time_indexed_points,  # [3] defaultdict(list) — points indexed by time
]

Data Loading — `pickle_loader.py`¶

Loads RLSTCcode-format pickle files directly. Defined in pickle_loader.py:

Function	Purpose
`load_trajectories(path)`	Load `Tdrive_norm_traj` → List[Trajectory]
`load_raw_trajectories(path)`	Load as raw RLSTCcode Traj objects (for classical arm)
`load_cluster_centers(path)`	Load `tdrive_clustercenter` → (cluster_dict, baseline_od)
`load_cluster_centers_raw(path)`	Load in MDP.py's native dict format
`load_subtrajectories(path)`	Load TRACLUS sub-trajectories
`load_test_set(path)`	Load held-out test/validation sets

MDL Preprocessing — `preprocessing.py`¶

The TRACLUS-style MDL simplification pipeline, ported from RLSTCcode. Defined in preprocessing.py:

Function	Purpose
`simplify_trajectory(traj)`	Greedy MDL-based trajectory simplification
`simplify_all(trajectories)`	Apply simplification to all trajectories
`preprocess_tdrive(raw, ...)`	Full pipeline: filter → normalize → simplify
`filter_by_coordinates(trajs)`	Geographic bounding box filter (Beijing)
`normalize_locations(trajs)`	Z-score normalize spatial coordinates
`normalize_time(trajs)`	Z-score normalize timestamps

Quantum Distance via Swap Test (Optional)¶

The swap test estimates inner products between amplitude-encoded states. Used only as a research validation probe — not in the training loop.

Algorithm¶

Prepare ancilla qubit in |0⟩
Hadamard on ancilla → |+⟩
Controlled-SWAP between data registers
Hadamard on ancilla
Measure: P(0) = 0.5 × (1 + |⟨ψ|φ⟩|²)

Cost Analysis¶

Requirement	Cost
Amplitude encoding per segment	O(n) gates
Re-encoding when segment grows	Full circuit rebuild
Qubits	2 × log₂(d) + 1 ancilla
Shots for ±0.01 precision	≥ 4,096
Circuit evals per episode	~100 per trajectory

Verdict: At ~100ms per circuit evaluation, quantum distance adds ~10 seconds per episode vs. <1ms classical. Acceptable for end-of-episode verification, impractical for per-step reward.

Next: Classical vs. Quantum Justifications →