Exploring Attention Models for Speech Enhancement

Proposal for a Master Thesis


Exploring Attention Models for Speech Enhancement


In the field of machine learning, recurrent neural networks (RNNs) using long short-term memory (LSTM) have achieved impressive results in natural language processing, text prediction, speech enhancement and machine translation. An important development is the attention mechanism [1], which allows to assign weights to the input of a neural network. This way, the parts of the input, to which the neural network should pay attention, can be weighted stronger than unimportant parts. In machine translation this would mean emphasizing names, places and important verbs, and neglecting filling words such as articles and conjunctions.

In this thesis project, a literature review shall be conducted to assess the possibilities attention models have in terms of speech enhancement. This includes collecting different applications and developments based on attention models. Based on the literature survey, an algorithm showcasing the power of attention should be chosen and implemented and compared to its not attention-based equivalent, e.g., [2]. From here, possible variations of the model with respect to speech enhancement can be explored.



Prof. Dr.-Ing. Walter Kellermann


Annika Briegleb, M.Sc., room 05.021 (Cauerstr. 7),


Fundamental understanding of recurrent neural networks, Python programming, ideally incl. Tensorflow