Description
Recent advances in autonomous driving perception increasingly rely on multi-view,
multi-sensor fusion to recover dense and semantically consistent 3D scene representations.
While LiDAR upsampling has been explored, it remains an open problem. Often treated as a
standalone task without leveraging complementary visual cues or learning consistent geometric
priors across modalities. This thesis proposes a cross modal transformer framework designed to
systematically explore camera-LiDAR fusion strategies for enhancing LiDAR upsampling.
Rather than committing to a single paradigm, we will investigate multiple fusion approaches;
spanning range-view, frustum-based, and latent feature alignment schemes to determine how
multi-view image backbones can best inform the upsampling of sparse LiDAR signals. A
transformer based bridge module will be developed to connect image and LiDAR
representations, potentially through attention or geometry aware correlation mechanisms.
Through experiments on autonomous driving benchmarks, we will evaluate how cross modal
supervision and feature sharing improve spatial coherence, realism, and reconstruction fidelity.
The study aims to establish a principled understanding of which fusion paradigms are most
effective for realistic LiDAR upsampling
Supervisor
Prof. Dr. Vasileios Belagiannis,