Multimodal Feature Extraction and Fusion for Audio-Visual Speech Recognition

Thumbnail

Event details

Date 16.01.2009
Hour 17:30
Speaker Mihai Gurban
Location
ELA1
Category Thesis defenses
Multimodal signal processing leads to the extraction of higher-quality and more reliable information than that would be obtained from single-modality signals. We are focusing on two main challenges in this field, feature extraction and multimodal fusion, and we are applying our proposed solutions to audio-visual speech recognition. First, we show how informative features can be extracted from the visual modality, using an information-theoretic framework which gives us a quantitative measure of the relevance of individual features. We also prove that reducing redundancy between these features is important for avoiding the curse of dimensionality and improving recognition results. Second, we present a method of multimodal fusion at the level of intermediate decisions using a weight for each of the monomodal streams. The weights are adaptive, changing according to the estimated reliability of each stream.

Practical information

  • General public
  • Free

Event broadcasted in

Share