Next-Generation Video Communications

Our group works on different topics in the domain of image and video compression. In this direction, we study state-of-the-art coding standards like HEVC and we develop new methods for upcoming standards like VVC. In addition, we tackle new and unconventional approaches for coding content like medical images videos, fisheye videos, or 360°-videos.

Contact person: Dr.-Ing. Christian Herglotz

Coding with Machine Learning

Video Coding for Deep Learning-Based Machine-to-Machine Communication

Kristian Fischer, M.Sc.
Link to person

Commonly, video codecs are designed and optimized for humans as final user. Nowadays, more and more multimedia data is transmitted for so-called machine-to-machine (M2M) applications, which means that the data is not observed by humans as the final user but successive algorithms solving several tasks. These tasks include smart industry, video surveillance, and autonomous driving scenarios. For these M2M applications, the detection rate of the successive algorithm is decisive instead of the subjective visual quality for humans. In order to evaluate the coding quality for M2M applications, we are currently focusing on neural object detection with Region-based Convolutional Neural Networks (R-CNNs).

One interesting question regarding the problem of video compression for M2M communication systems is how much the original data can be compressed until the detection rate drops. Besides, we are testing modifications on current video codecs to achieve an optimal proportion of compression and detection rate.


Deep Learning for Video coding

Fabian Brand, M.Sc.
Link to person

The increased processing power of mobile devices will make it possible in long-term to employ deep learning techniques in coding standards. Many components of modern video coders can be implemented using neural networks. The focus of this research is the area of intra-frame-prediction. This concept has been part of video coders for a long time. The technique is used to estimate the content of a block just from the spatial neighborhood, such that only the difference has to be transmitted. In contrast to the so-called inter-frame-prediction which employs different reference pictures to reduce temporal redundancy, intra-frame-prediction only uses the to-be-coded image itself, thus reducting spatial redundancy.

Previous coding standards mainly use angular prediction. There, pixels from the border area are copied in a certain angle into the block. This method is very efficient but not able to predict non-linear structures. Since neural networks are so-called universal approximators, which describes the ability to approximate arbitrary functions arbitrarily close, they are able to also predict more compley structures. The following picture shows an example block, which has been predicted with a traditional method, and with a neural-network-based approach. We see, that the neural network performs better in predicting the round shape of the block.

Left: Original, Center: Traditional Method (VTM 4.2), Right: Prediction with a neural network.

Energy and Power Efficient Video Communications

Nowadays, video communications have conquered the mass markets such that billions of end-users worldwide make use of online video applications on highly versatile devices like smartphones, TVs, or tablet PCs. Recent studies show that 1% of the greenhouse gas emissions worldwide are related to video communication services (link). This number includes all factors in the video communication toolchain such as video capture, compression, storage, transmission, decoding, and replay. Due to the large impact and the potential rise in video communication demand in the future, it is highly important to investigate the energy efficiency of practical solutions and come up with new ideas to allow a sustainable use of this technology.

To tackle this important problem, our group is committed to perform research in the field of energy efficient video communication solutions. In the past, we constructed dedicated measurement setups to be able to analyze the energy and the power consumption of various hardware and software tools being related to video communications. In terms of hardware, we tested desktop PCs, evaluation boards, smartphones, and distinct components of these devices. In terms of software, we investigated various decoders for multiple compression standards, hardware chips, and fully-functional media players. With the help of this data, we were able to develop accurate energy and power models describing the consumption in high detail. Additionally, these models allowed us to come up with new ideas to reduce and optimize the energy consumption.

We strive to dig deeper into this topic to obtain a fundamental understanding of all components contributing to the overall power consumption. Current topics include the encoding process, streaming and transmission issues, and modern video formats like 360° video coding. For future work, we are always searching for interesting ideas and collaborations to contrive new and promising solutions for energy efficient video communications. We are happy if you are interested and support our work.

Currently, we are working on the following topics:

Energy Efficient Video Coding

Matthias Kränzler, M.Sc.
Link to person

In recent years, the amount and share of video-data in the global internet data traffic has steadily increasing. Both the encoding on the transmitter side and the decoding on the receiver side have a high energy demand. Research on energy-efficient video decoding has shown that it is possible to optimize the energy demand of the decoding process. This research area deals with the modeling of the energy required for the encoding of compressed video data. The aim of the modeling is to optimize the energy efficiency of the entire video coding.

„Big Buck Bunny“ by Big Buck Bunny is licensed under CC BY 3.0


Energy Efficient Video Decoding

Dr.-Ing. Christian Herglotz
Link to person

This field of research tackles the power consumption of video decoding systems. In this respect, software as well as hardware systems are studied in high detail. An detailed analysis of the decoding energy on various platforms with various conditions can be found on the DEVISTO homepage:

Decoding Energy Visualization Tool (DEVISTO)

With the help of a high number of measurements, sophisticated energy models could be constructed which are able to accurately estimate the overall power and energy. A visualization of the modeling process for software decoding is given on the DENESTO homepage:

Decoding Energy Estimation Tool (DENESTO)

Finally, the information from the model can be exploited in rate-distortion optimization during encoding to obtain bit streams requiring less decoding energy. The source code of such an encoder can be downloaded here:

Decoding-Energy-Rate-Distortion Optimization for Video Coding (DERDO)

Coding of Medical Content

Scalable Lossless Coding of Dynamic Medical Data Using Compensated Multi-Dimensional Wavelet-Lifting:

Daniela Wokusch, M.Sc.
Link to person

This project focuses on scalable lossless coding of dynamic medical data. An efficient scalable representation of dynamic volume data from medical devices like Computed Tomography is very important for telemedicine applications. Thereby, lossless reconstruction is regulated by law and has to be guaranteed. Compensated Wavelet-Lifting combines scalability features and lossless reconstruction by only one processing step.

A wavelet transform (WT) decomposes a signal into a high- and lowpass subband. This allows for analysing the signal in multiple resolutions and provides an efficient coding of the volume by the energy compaction in the lowpass subband. Further, the quality of the lowpass subband can be increased by suitable approaches for motion compensation. By applying this coding scheme quality scalability as well as spatial and temporal scalability can be achieved. The block diagram above shows the single processing steps of 3-dimensional Wavelet-Lifting.

Coding of ultra wide-angle and 360° video data

Projection-based video coding

Andy Regensky, M.Sc.
Link to person

Ultra-wide angle and 360° video data is subject to a variety of distortions that do not occur in conventional video data recorded with perspective lenses. These distortions occur mainly because ultra wide-angle lenses do not follow the pinhole camera model and therefore have special image characteristics. This becomes clear, for example, as straight lines are displayed in a curved form on the image sensor. This is the only way to achieve fields of view of 180° and more with only one camera. By means of so-called stitching processes, several camera views can be combined to form 360° video, which allow a complete all-round view. Often this is achieved by using two ultra wide-angle cameras, each camera capturing a hemisphere. To be able to compress the resulting spherical 360° recordings using existing video codecs, the images must be projected onto the two dimensional image surface. Various mapping functions are used for this purpose. Often, the Equirectangular format is chosen, which is comparable to the representation of the globe on a world map, and thus depicts 360° in horizontal and 180° in vertical direction.

Since conventional video codecs are not adapted to mapping functions deviating from the perspective projection, losses occur which can be reduced by taking the actual projection formats into account. Therefore, in this project different coding aspects are investigated and optimized with respect to the occurring projections of ultra wide-angle and 360° video data. A special focus lies on projection-based motion compensation and intra-prediction.

Coding of screen content

Screen content coding based on machine learning and statistical modelling

Hannah Och, M.Sc.
Link to person

In recent years processing of so-called screen content has increasingly attracted attention. Screen content represents images, which can typically be seen on desktop PCs, smartphones or similar devices. Such images or sequences have very diverse statistical properties. Generally, they contain ‘synthetic’ content, namely buttons, graphics, diagrams, symbols, texts, etc. which have two significant characteristics: small varieties of colors as well as repeating patterns. Next to aforesaid structures, screen content also includes ‘natural’ content, like photographs, videos, medical images or computer generated photo-realistic animations. Unlike synthetic content natural images are characterized by irregular color gradients and a certain amount of noise. Screen content is typically a mixture of both synthetic and natural parts. The transmission of such images and image sequences is required for a multitude of applications such as screen sharing, cloud computing and gaming.

Screen Content example containing ‘natural’ and ‘synthetic’ parts

However, screen content can be a challenge for conventional coding schemes, since they are mostly optimized for camera-captured (‘natural’) scenes and cannot compress screen content efficiently. Thus, this project focuses on the further development and performance measurement based on a novel compression method for lossless and visually lossless or lossy coding of screen content images and sequences. Particular emphasis will be placed on a combination of machine learning and statistical modeling.

Coding of Point Cloud data

Coding of point cloud geometry and attributes using deep learning tools

Dat Thanh Nguyen, M.Sc.
Link to person

Point Clouds are becoming one of the most common data structures to represent 3D scenes as it enables six degrees of freedom (6DoF) viewing experience. However, a typical point cloud contains millions of 3D points and requires a huge amount of storage. Hence, efficient Point Cloud Compression (PCC) methods are just inevitable in order to bring point cloud into practical applications. Unlike 2D image/video, point clouds are sparse and irregular (see the image), which make the compression task even more difficult.

Red and black point cloud from MPEG 8i dataset.

In the recent years, the research society has been paying attention on this type of data, but the compression rate is still below the compression rates of 2D-image coding algorithms (JPEG, HEVC, VVC,…). With the help of recent advances in deep learning techniques, in this project, we aim to tackle challenges in PCC including:

  • Sparsity – most of the 3D space is empty, typically less than 2% of space is occupied, however, exploiting the redundancy and encoding the non-empty space are not easy tasks.
  • Irregularity – unlike 2D images, where pixels are sampled uniformly over 2D planes, irregular sampling of point clouds makes it difficult to use traditional signal processing methods.
  • Huge spatial volume – the information contained in a single 10 bits point cloud frame already equivalent to 1024 2D images of size 1024 × 1024. Such a point cloud would require enormous computational operations when applying any kind of signal processing technique.

Point Clouds can be encoded and then used for different purposes such as VR, world heritage, medical analysis, etc. And thus, in this project, we investigate geometry and attributes coding in both lossless and lossy scenarios to provide solutions for various applications and purposes.