Next-Generation Video Communications

Our group works on different topics in the domain of image and video compression. In this direction, we study state-of-the-art coding standards like HEVC and we develop new methods for upcoming standards like VVC. In addition, we tackle new and unconventional approaches for coding content like medical images videos, fisheye videos, or 360°-videos.

Contact person: Dr.-Ing. Christian Herglotz

Coding with Machine Learning

Video Coding for Deep Learning-Based Machine-to-Machine Communication

Kristian Fischer, M.Sc.
Link to person

Commonly, video codecs are designed and optimized for humans as final user. Nowadays, more and more multimedia data is transmitted for so-called machine-to-machine (M2M) applications, which means that the data is not observed by humans as the final user but successive algorithms solving several tasks. These tasks include smart industry, video surveillance, and autonomous driving scenarios. For these M2M applications, the detection rate of the successive algorithm is decisive instead of the subjective visual quality for humans. In order to evaluate the coding quality for M2M applications, we are currently focusing on neural object detection with Region-based Convolutional Neural Networks (R-CNNs).

One interesting question regarding the problem of video compression for M2M communication systems is how much the original data can be compressed until the detection rate drops. Besides, we are testing modifications on current video codecs to achieve an optimal proportion of compression and detection rate.


Deep Learning for Video coding

Fabian Brand, M.Sc.
Link to person

The increased processing power of mobile devices will make it possible in long-term to employ deep learning techniques in coding standards. Many components of modern video coders can be implemented using neural networks. The focus of this research is the area of intra-frame-prediction. This concept has been part of video coders for a long time. The technique is used to estimate the content of a block just from the spatial neighborhood, such that only the difference has to be transmitted. In contrast to the so-called inter-frame-prediction which employs different reference pictures to reduce temporal redundancy, intra-frame-prediction only uses the to-be-coded image itself, thus reducting spatial redundancy.

Previous coding standards mainly use angular prediction. There, pixels from the border area are copied in a certain angle into the block. This method is very efficient but not able to predict non-linear structures. Since neural networks are so-called universal approximators, which describes the ability to approximate arbitrary functions arbitrarily close, they are able to also predict more compley structures. The following picture shows an example block, which has been predicted with a traditional method, and with a neural-network-based approach. We see, that the neural network performs better in predicting the round shape of the block.

Left: Original, Center: Traditional Method (VTM 4.2), Right: Prediction with a neural network.

Energy and Power Efficient Video Communications

Nowadays, video communications have conquered the mass markets such that billions of end-users worldwide make use of online video applications on highly versatile devices like smartphones, TVs, or tablet PCs. Recent studies show that 1% of the greenhouse gas emissions worldwide are related to video communication services (link). This number includes all factors in the video communication toolchain such as video capture, compression, storage, transmission, decoding, and replay. Due to the large impact and the potential rise in video communication demand in the future, it is highly important to investigate the energy efficiency of practical solutions and come up with new ideas to allow a sustainable use of this technology.

To tackle this important problem, our group is committed to perform research in the field of energy efficient video communication solutions. In the past, we constructed dedicated measurement setups to be able to analyze the energy and the power consumption of various hardware and software tools being related to video communications. In terms of hardware, we tested desktop PCs, evaluation boards, smartphones, and distinct components of these devices. In terms of software, we investigated various decoders for multiple compression standards, hardware chips, and fully-functional media players. With the help of this data, we were able to develop accurate energy and power models describing the consumption in high detail. Additionally, these models allowed us to come up with new ideas to reduce and optimize the energy consumption.

We strive to dig deeper into this topic to obtain a fundamental understanding of all components contributing to the overall power consumption. Current topics include the encoding process, streaming and transmission issues, and modern video formats like 360° video coding. For future work, we are always searching for interesting ideas and collaborations to contrive new and promising solutions for energy efficient video communications. We are happy if you are interested and support our work.

Currently, we are working on the following topics:

Energy Efficient Video Coding

Matthias Kränzler, M.Sc.
Link to person

In recent years, the amount and share of video-data in the global internet data traffic has steadily increasing. Both the encoding on the transmitter side and the decoding on the receiver side have a high energy demand. Research on energy-efficient video decoding has shown that it is possible to optimize the energy demand of the decoding process. This research area deals with the modeling of the energy required for the encoding of compressed video data. The aim of the modeling is to optimize the energy efficiency of the entire video coding.

„Big Buck Bunny“ by Big Buck Bunny is licensed under CC BY 3.0


Energy Efficient Video Decoding

Dr.-Ing. Christian Herglotz
Link to person

This field of research tackles the power consumption of video decoding systems. In this respect, software as well as hardware systems are studied in high detail. An detailed analysis of the decoding energy on various platforms with various conditions can be found on the DEVISTO homepage:

Decoding Energy Visualization Tool (DEVISTO)

With the help of a high number of measurements, sophisticated energy models could be constructed which are able to accurately estimate the overall power and energy. A visualization of the modeling process for software decoding is given on the DENESTO homepage:

Decoding Energy Estimation Tool (DENESTO)

Finally, the information from the model can be exploited in rate-distortion optimization during encoding to obtain bit streams requiring less decoding energy. The source code of such an encoder can be downloaded here:

Decoding-Energy-Rate-Distortion Optimization for Video Coding (DERDO)

Coding of Medical Content

Scalable Lossless Coding of Dynamic Medical Data Using Compensated Multi-Dimensional Wavelet-Lifting:

Daniela Wokusch, M.Sc.
Link to person

This project focuses on scalable lossless coding of dynamic medical data. An efficient scalable representation of dynamic volume data from medical devices like Computed Tomography is very important for telemedicine applications. Thereby, lossless reconstruction is regulated by law and has to be guaranteed. Compensated Wavelet-Lifting combines scalability features and lossless reconstruction by only one processing step.

A wavelet transform (WT) decomposes a signal into a high- and lowpass subband. This allows for analysing the signal in multiple resolutions and provides an efficient coding of the volume by the energy compaction in the lowpass subband. Further, the quality of the lowpass subband can be increased by suitable approaches for motion compensation. By applying this coding scheme quality scalability as well as spatial and temporal scalability can be achieved. The block diagram above shows the single processing steps of 3-dimensional Wavelet-Lifting.

Coding of ultra wide-angle and 360° video data

Projection-based video coding

Andy Regensky, M.Sc.
Link to person

Ultra-wide angle and 360° video data is subject to a variety of distortions that do not occur in conventional video data recorded with perspective lenses. These distortions occur mainly because ultra wide-angle lenses do not follow the pinhole camera model and therefore have special image characteristics. This becomes clear, for example, as straight lines are displayed in a curved form on the image sensor. This is the only way to achieve fields of view of 180° and more with only one camera. By means of so-called stitching processes, several camera views can be combined to form 360° video, which allow a complete all-round view. Often this is achieved by using two ultra wide-angle cameras, each camera capturing a hemisphere. To be able to compress the resulting spherical 360° recordings using existing video codecs, the images must be projected onto the two dimensional image surface. Various mapping functions are used for this purpose. Often, the Equirectangular format is chosen, which is comparable to the representation of the globe on a world map, and thus depicts 360° in horizontal and 180° in vertical direction.

Since conventional video codecs are not adapted to mapping functions deviating from the perspective projection, losses occur which can be reduced by taking the actual projection formats into account. Therefore, in this project different coding aspects are investigated and optimized with respect to the occurring projections of ultra wide-angle and 360° video data. A special focus lies on projection-based motion compensation and intra-prediction.