Multimodal Feature Extraction and Fusion for Audio-Visual Speech Recognition

Event details
Date | 16.01.2009 |
Hour | 17:30 |
Speaker | Mihai Gurban |
Location |
ELA1
|
Category | Thesis defenses |
Multimodal signal processing leads to the extraction of higher-quality
and more reliable information than that would be obtained from
single-modality signals. We are focusing on two main challenges in
this field, feature extraction and multimodal fusion, and we are
applying our proposed solutions to audio-visual speech recognition.
First, we show how informative features can be extracted from the
visual modality, using an information-theoretic framework which gives
us a quantitative measure of the relevance of individual features. We
also prove that reducing redundancy between these features is
important for avoiding the curse of dimensionality and improving
recognition results. Second, we present a method of multimodal fusion
at the level of intermediate decisions using a weight for each of the
monomodal streams. The weights are adaptive, changing according to the
estimated reliability of each stream.
Practical information
- General public
- Free