Data Layer¶
Overview¶
The data layer (q_rlstc/data/) contains all trajectory data structures, distance metrics, the MDP environment, and preprocessing utilities. These modules were originally ported from RLSTCcode/subtrajcluster/ and have been fully documented with type hints and descriptive variable names.
Module Map¶
| Module | Purpose |
|---|---|
rlstc_point.py |
Point(x, y, t) — GPS point with spatial coordinates and timestamp |
rlstc_point_xy.py |
Point_xy(x, y) — 2D point for segment geometry (no timestamp) |
rlstc_segment.py |
Segment — directed line segment with perpendicular, parallel, and angle distances |
rlstc_traj.py |
Traj — ordered sequence of Points forming a trajectory |
rlstc_trajdistance.py |
IED, Fréchet, DTW, segment distance, and MDL simplification cost |
rlstc_cluster.py |
Incremental IED clustering and cluster-center maintenance |
rlstc_mdp.py |
TrajRLclus — MDP environment for RL-based trajectory segmentation |
preprocessing.py |
GPS filtering, length normalization, z-score normalization, MDL simplification |
trajectory_scheduler.py |
TrajectoryScheduler — controls trajectory sampling across epochs |
dataset_loader.py |
DatasetLoader (ABC), CustomDatasetLoader, DatasetConfig — abstract dataset loading interface |
Class Hierarchy¶
Point(x, y, t)
└─ Used by: Traj, Segment (via Point_xy), distance functions
Point_xy(x, y)
└─ Used by: Segment (2D geometry only)
Segment(start: Point_xy, end: Point_xy)
└─ perpendicular_distance(), parallel_distance(), angle_distance()
└─ Used by: IED computation, TRACLUS-style distance
Traj(points: List[Point], size, ts, te, traj_id)
└─ Primary data container for trajectories
└─ Used by: MDP, clustering, distance functions
Distance Metrics¶
IED (Integrated Euclidean Distance)¶
The primary distance metric. Integrates point-wise Euclidean distance over the overlapping temporal window of two trajectories, plus tail contributions for non-overlapping regions.
Incremental variant: Instead of recomputing from scratch each time the agent extends a sub-trajectory, incremental_sp / incremental_nsp cache partial distances in a k_dict dictionary.
Fréchet Distance¶
The "dog-walking" distance — minimum leash length for two entities traversing curves simultaneously. Computed via dynamic programming.
DTW (Dynamic Time Warping)¶
Classic elastic alignment distance. Each point in one trajectory can match multiple points in the other. Uses DP to minimize total aligned cost.
TRACLUS Segment Distance¶
Sum of three components between endpoint-defined line segments: - Perpendicular — sideways separation - Parallel — overshoot/undershoot - Angle — directional divergence
MDP Environment¶
TrajRLclus manages the RL training loop. Key properties:
| Property | Value |
|---|---|
| State dimension | 5 (or 4 if ablate_odb=True) |
| Actions | 2: EXTEND (0) or CUT (1) |
| Reward | Δ(overall_sim) — decrease in mean IED |
| Min segment length | Configurable (min_seg_len, default 3) |
| Clusters | Separate T (training) and E (evaluation) dicts |
State vector:
Preprocessing Pipeline¶
The preprocess_pipeline() function chains 5 steps matching the original RLSTCcode:
raw GPS data → filter by bounding box → enforce length limits
→ z-score normalize coordinates → MDL simplify → pickle output