Multimodal Feature Extraction and Fusion for Audio-Visual Speech Recognition

Event details

Date	16.01.2009
Hour	17:30
Speaker	Mihai Gurban
Location	ELA1
Category	Thesis defenses

Multimodal signal processing leads to the extraction of higher-quality and more reliable information than that would be obtained from single-modality signals. We are focusing on two main challenges in this field, feature extraction and multimodal fusion, and we are applying our proposed solutions to audio-visual speech recognition. First, we show how informative features can be extracted from the visual modality, using an information-theoretic framework which gives us a quantitative measure of the relevance of individual features. We also prove that reducing redundancy between these features is important for avoiding the curse of dimensionality and improving recognition results. Second, we present a method of multimodal fusion at the level of intermediate decisions using a weight for each of the monomodal streams. The weights are adaptive, changing according to the estimated reliability of each stream.

Multimodal Feature Extraction and Fusion for Audio-Visual Speech Recognition

Event details

Practical information

Export Event

Event broadcasted in