Embedded Intelligence Lab
EMILAB
Research report

Mobile Traffic Camera Calibration from Road Geometry for UAV-Based Traffic Surveillance

Converting monocular oblique UAV traffic video into a local metric bird's-eye-view representation using visible road geometry — enabling deployable mobile traffic cameras without pre-installed calibration.

A. Popov, N. Trukhina, V. Vashkelis
|
May 2026
|
arXiv:2605.11900
Dynamic 3D cuboid scene reconstructed from UAV traffic video

Abstract

Unmanned aerial vehicles can provide flexible traffic surveillance in locations where fixed roadside cameras are unavailable, costly, or impractical to install. However, raw UAV video is difficult to use for traffic analytics because vehicle motion is observed in perspective image coordinates rather than in a stable metric road coordinate system. This work presents a lightweight pipeline for converting monocular oblique UAV traffic video into a local metric bird's-eye-view representation. The method uses visible road geometry — lane markings, road borders, and crosswalks — to estimate a road-plane homography from image coordinates to metric ground-plane coordinates. Vehicle observations are projected to BEV using estimated ground contact points, producing metric trajectories, vehicle direction, speed, heading, and dynamic 3D cuboids on the road plane.

Processing Pipeline

A perception-style pipeline converts raw UAV footage into a semantic-geometric 3D scene — closer to autonomous-driving BEV perception than photogrammetric 3D reconstruction.

01

Frame sampling

Sample subsequence from UAV video; map sampled frames to original frame indices to prevent annotation drift.

02

Road-plane calibration

Estimate image-to-ground homography from visible road geometry — lane markings, road borders, crosswalk corners.

03

Vehicle projection

Approximate ground contact points from 2D bounding boxes; project to metric road-plane coordinates via homography.

04

Trajectory & speed

Sort ground positions by frame; estimate velocity by finite differences; compute speed and heading.

05

3D cuboid generation

Place oriented metric cuboids on the road plane using class-dependent dimension priors and road-axis-aware heading.

Road-Geometry Calibration

The mobile-camera calibration is defined from visible road geometry. The workflow supports human-in-the-loop calibration: the operator selects visible road-plane points and assigns metric coordinates. Red points are active correspondences; cyan points provide additional visual checks.

Road-geometry calibration on UAVDT M1401 reference frame. Red points define active road-plane correspondences; cyan points provide visual checks.
Fig. 1 — Road-geometry calibration on the UAVDT M1401 reference frame. The calibrated ground plane spans approximately 24 m across the road and 90 m along the road.

Synchronized Output

The strongest demo view is the synchronized triptych: original UAV frame, metric BEV view, and dynamic 3D cuboid scene. This representation separates image appearance from metric traffic state.

Synchronized demonstration frame: original UAV frame, metric BEV view, and dynamic 3D cuboid scene
Fig. 2 — Synchronized demonstration frame. Left: original UAV frame. Upper right: metric BEV view. Lower right: metric 3D cuboid scene.

Metric BEV Tracks

Vehicle tracks projected from oblique image coordinates into a local metric road-plane coordinate system. Trajectories are expressed in calibrated meters rather than perspective image pixels.

Metric BEV vehicle tracks
Fig. 3 — Metric BEV vehicle tracks for the M1401 run.
Metric BEV animation showing tracked vehicle positions over time

Dynamic 3D Cuboids

A lightweight traffic-state representation — not a photorealistic reconstruction. Each vehicle is an oriented metric cuboid with class-dependent dimensions and road-axis-aware heading.

Metric 3D cuboids preview
Fig. 4 — Metric 3D cuboid preview for one sampled frame.
Dynamic 3D cuboid animation

Vehicle Dimension Priors

Cuboid dimensions are based on class-level priors. In the M1401 sequence, all annotated vehicles correspond to the car category.

ClassLength (m)Width (m)Height (m)
Car4.51.81.5
Truck8.02.53.0
Bus12.02.63.2
Generic vehicle4.51.81.5

Key Findings

Experimental observations from the UAVDT M1401 evaluation.

01

Automatic calibration is viable with human refinement

Human-in-the-loop calibration provides a moderate increase in geometric accuracy over fully automatic estimation, particularly for lane alignment and far-field consistency. However, automatic calibration already produces usable results and can be further improved — making it a practical option for real-world deployment scenarios where manual intervention is not feasible.

02

Far-field vehicles are highly sensitive to homography errors

Distant vehicles occupy fewer pixels and lie near the vanishing region of the perspective projection. Small image-coordinate or lane-width errors can produce large lateral shifts in metric coordinates, making far-field drift a key diagnostic for calibration quality.

03

Road-axis-aware heading reduces cuboid jitter

Raw motion-based heading is noisy due to annotation jitter and frame sampling. Regularizing headings toward a dominant road axis produces more plausible cuboid orientation on straight segments.

04

Frame mapping is critical for trajectory fidelity

Using an incorrect frame step caused annotation boxes to move approximately twice as fast as visible vehicles. Every sampled frame must be associated with its original frame index through an explicit mapping table.

Why not full 3D reconstruction?

Moving vehicles are difficult for classical 3D reconstruction or 3D Gaussian Splatting because they violate static-scene assumptions. Each vehicle may appear in only a small number of frames and its position changes independently of the camera. For traffic analytics, detailed mesh reconstruction is unnecessary — a metric cuboid representation is more practical, faster, and better aligned with traffic-surveillance needs.

Chosen representation

staticRoad plane / BEV map / optional background reconstruction
dynamicTracked 3D boxes or lightweight CAD-like shapes
Priority: speed, low compute cost, interpretable geometry, track consistency

Known Limitations

Planar road assumption

The method assumes the road is locally planar. Curved or multi-level road geometry is not modeled.

Approximate cuboids

Vehicle cuboids use class-level dimension priors rather than measured per-vehicle geometry.

Calibration sensitivity

Homography quality depends strongly on calibration point placement. Small errors cause large far-field displacements.

Ground-point approximation

Bottom-center ground contact is an approximation. Oblique views may include roof pixels or shadows.

Single-plane constraint

A road-plane homography cannot reconstruct vertical geometry such as poles, signs, or barriers.

Manual validation required

Human-in-the-loop calibration is currently more reliable than fully automatic methods.

Dataset

The pipeline was developed and tested using various sequences from UAVDT benchmark, which provides annotated UAV traffic videos under challenging conditions including camera motion, high density, small object size, occlusion, and scale variation.

References

[1]D. Du, Y. Qi, H. Yu, Y. Yang, K. Duan, G. Li, W. Zhang, Q. Huang, and Q. Tian, "The Unmanned Aerial Vehicle Benchmark: Object Detection and Tracking," in Proc. ECCV, 2018, pp. 370–386.
[2]S. Byun, D. Lee, H. Park, and H. Choi, "Road Traffic Monitoring from UAV Images Using Deep Learning Networks," Remote Sensing, vol. 13, no. 20, Art. no. 4027, 2021.
[3]S. M. Tilon, T. Nex, and G. Vosselman, "Vehicle Tracking and Speed Estimation from UAV Videos," ISPRS Annals, vol. X-1/W1-2023, pp. 431–438, 2023.
[4]G. D'Amicantonio, E. Bondarev, and P. H. N. de With, "Automated Camera Calibration via Homography Estimation with GNNs," in Proc. WACV, 2024, pp. 5876–5883.
[5]J. Philion and S. Fidler, "Lift, Splat, Shoot: Encoding Images from Arbitrary Camera Rigs by Implicitly Unprojecting to 3D," in Proc. ECCV, 2020, pp. 194–210.
[6]N. Trukhina and V. Vashkelis, "Hybrid Visual Telemetry for Bandwidth-Constrained Robotic Vision: A Pilot Study with HEVC Base Video and JPEG ROI Stills," arXiv:2605.01826, 2026.

Interested in this research?

We are open to collaborations on UAV perception, BEV reconstruction, and embedded traffic analytics.

Get in Touch