In this project we are working on various approaches for the transfer of deep neural networks trained in a simulated environment to the real world. Computer vision contributes in this regard by creating visual priors acting as input to another system.
Recent studies on embodied agents show the importance of visual priors, such as semantic segmentation or depth estimation in autonomous driving, when performing actions in the physical environment, as they can increase the accuracy of such an agent significantly. However, producing these visual priors requires large amounts of annotated data. Building these datasets is costly, time-consuming and tedious because it often requires domain-specific knowledge and is a time-consuming task. To address these limitations, simulation can be seen as an alternative approach for data and annotation generation, capable of creating large amounts of data depending on the task at hand. Although data simulation has clear advantages over real-world datasets, there is a clear limitation at the same time. Training a deep neural network with synthetic data does not result in good performance when tested on real-world data: Real samples differ significantly in terms of realism in the textures.
The aim of this project is to conduct research on closing this performance gap. Our main testbed for measuring this performance will be semantic image segmentation and depth estimation from single images.
There are several ways to approach closing the performance gap in sim2real:
1. Domain adaptation
Domain adaptation is a popular approach in computer vision to transfer a model trained on one dataset (the source domain) to another dataset (the target domain). This can be achieved by learning domain-invariant features or by adapting the source domain images to look like the target domain images using image-to-image models, adversarial learning, or a combination of both.
2. Domain randomization
Domain randomization is a popular methodology primarily used in robotics. The idea is to randomly vary the parameters of the simulation to make the model robust to variations in the real world. This is done by generating a large number of synthetic training examples that cover the range of variations expected in the real world.
3. Image generation
Another approach is to generate realistic images that can be used to train a model. This can be done using generative deep neural networks that transform synthetic images to real-world images, or by generating a canonical representation of the input data and using it to train a deep learning model.
Meta-learning, also known as learning to learn, is a subfield of machine learning that focuses on the ability of a model to learn new tasks quickly and efficiently by leveraging prior knowledge learned from related tasks. The idea behind meta-learning is that instead of training a model on a single task from scratch, a meta-learning model is trained on a distribution of tasks, with the goal of learning a generalizable representation that can be adapted to new tasks with minimal fine-tuning.
In our context, meta-learning is proposed as a way to address the problem of poor performance when transferring deep neural network models from simulation to real-world applications. The main idea is to reduce the performance gap by teaching the deep neural network how to quickly adapt to new environments through meta-learning. Thus our ultimate goal is to learn a model in simulation that can be adapted to the real-world with minimal fine-tuning.
In our current research we want to leverage generative deep learning in order to generate realistic images, following approach 3. This is realized by implementing SOTA approaches, such as diffusion models and its variations.
This project is funded by DFG . Project number-458972748