Mobile Traffic Camera Calibration from Road Geometry for UAV-Based Traffic Surveillance
Converting monocular oblique UAV traffic video into a local metric bird's-eye-view representation using visible road geometry — enabling deployable mobile traffic cameras without pre-installed calibration.
Abstract
Unmanned aerial vehicles can provide flexible traffic surveillance in locations where fixed roadside cameras are unavailable, costly, or impractical to install. However, raw UAV video is difficult to use for traffic analytics because vehicle motion is observed in perspective image coordinates rather than in a stable metric road coordinate system. This work presents a lightweight pipeline for converting monocular oblique UAV traffic video into a local metric bird's-eye-view representation. The method uses visible road geometry — lane markings, road borders, and crosswalks — to estimate a road-plane homography from image coordinates to metric ground-plane coordinates. Vehicle observations are projected to BEV using estimated ground contact points, producing metric trajectories, vehicle direction, speed, heading, and dynamic 3D cuboids on the road plane.
Processing Pipeline
A perception-style pipeline converts raw UAV footage into a semantic-geometric 3D scene — closer to autonomous-driving BEV perception than photogrammetric 3D reconstruction.
Frame sampling
Sample subsequence from UAV video; map sampled frames to original frame indices to prevent annotation drift.
Road-plane calibration
Estimate image-to-ground homography from visible road geometry — lane markings, road borders, crosswalk corners.
Vehicle projection
Approximate ground contact points from 2D bounding boxes; project to metric road-plane coordinates via homography.
Trajectory & speed
Sort ground positions by frame; estimate velocity by finite differences; compute speed and heading.
3D cuboid generation
Place oriented metric cuboids on the road plane using class-dependent dimension priors and road-axis-aware heading.
Road-Geometry Calibration
The mobile-camera calibration is defined from visible road geometry. The workflow supports human-in-the-loop calibration: the operator selects visible road-plane points and assigns metric coordinates. Red points are active correspondences; cyan points provide additional visual checks.

Synchronized Output
The strongest demo view is the synchronized triptych: original UAV frame, metric BEV view, and dynamic 3D cuboid scene. This representation separates image appearance from metric traffic state.
Metric BEV Tracks
Vehicle tracks projected from oblique image coordinates into a local metric road-plane coordinate system. Trajectories are expressed in calibrated meters rather than perspective image pixels.


Dynamic 3D Cuboids
A lightweight traffic-state representation — not a photorealistic reconstruction. Each vehicle is an oriented metric cuboid with class-dependent dimensions and road-axis-aware heading.


Vehicle Dimension Priors
Cuboid dimensions are based on class-level priors. In the M1401 sequence, all annotated vehicles correspond to the car category.
| Class | Length (m) | Width (m) | Height (m) |
|---|---|---|---|
| Car | 4.5 | 1.8 | 1.5 |
| Truck | 8.0 | 2.5 | 3.0 |
| Bus | 12.0 | 2.6 | 3.2 |
| Generic vehicle | 4.5 | 1.8 | 1.5 |
Key Findings
Experimental observations from the UAVDT M1401 evaluation.
Automatic calibration is viable with human refinement
Human-in-the-loop calibration provides a moderate increase in geometric accuracy over fully automatic estimation, particularly for lane alignment and far-field consistency. However, automatic calibration already produces usable results and can be further improved — making it a practical option for real-world deployment scenarios where manual intervention is not feasible.
Far-field vehicles are highly sensitive to homography errors
Distant vehicles occupy fewer pixels and lie near the vanishing region of the perspective projection. Small image-coordinate or lane-width errors can produce large lateral shifts in metric coordinates, making far-field drift a key diagnostic for calibration quality.
Road-axis-aware heading reduces cuboid jitter
Raw motion-based heading is noisy due to annotation jitter and frame sampling. Regularizing headings toward a dominant road axis produces more plausible cuboid orientation on straight segments.
Frame mapping is critical for trajectory fidelity
Using an incorrect frame step caused annotation boxes to move approximately twice as fast as visible vehicles. Every sampled frame must be associated with its original frame index through an explicit mapping table.
Why not full 3D reconstruction?
Moving vehicles are difficult for classical 3D reconstruction or 3D Gaussian Splatting because they violate static-scene assumptions. Each vehicle may appear in only a small number of frames and its position changes independently of the camera. For traffic analytics, detailed mesh reconstruction is unnecessary — a metric cuboid representation is more practical, faster, and better aligned with traffic-surveillance needs.
Chosen representation
Known Limitations
Planar road assumption
The method assumes the road is locally planar. Curved or multi-level road geometry is not modeled.
Approximate cuboids
Vehicle cuboids use class-level dimension priors rather than measured per-vehicle geometry.
Calibration sensitivity
Homography quality depends strongly on calibration point placement. Small errors cause large far-field displacements.
Ground-point approximation
Bottom-center ground contact is an approximation. Oblique views may include roof pixels or shadows.
Single-plane constraint
A road-plane homography cannot reconstruct vertical geometry such as poles, signs, or barriers.
Manual validation required
Human-in-the-loop calibration is currently more reliable than fully automatic methods.
Dataset
The pipeline was developed and tested using various sequences from UAVDT benchmark, which provides annotated UAV traffic videos under challenging conditions including camera motion, high density, small object size, occlusion, and scale variation.
References
Interested in this research?
We are open to collaborations on UAV perception, BEV reconstruction, and embedded traffic analytics.
