Description
Visual Foundation models (VFM), like DINOv3, V-JEPA,… are widely employed for diverse computer vision tasks. Unlike traditional computer vision pipelines, VFMs learn rich, transferable representations from vast amounts of data, enabling strong generalization across tasks with little to no finetuning.
This research shall explore the usage of VFM models in machine-to-machine communication of visual data.
Possible Topics
- Cross-architecture feature translation
- Intra frame feature prediction
- Inter frame feature prediction (forecasting)
- VFM feature compression
Requirements
Experience in Python programming, Deep Learning (PyTorch/TF) and Image- and Videocompression
Supervisor
Marc Windsheimer
marc.windsheimer@fau.de
Raum 06.036
Professor
Prof. Dr.-Ing. André Kaup
andre.kaup@fau.de
Raum 06.031