On the use of visual information for improving audio-based speaker recognition
A. Senior, C. Neti and B. Maison
In proceedings of
Audio Visual Speech Processing, Santa Cruz, California, 7-9 August 1999.
Audiobased speaker identification degrades severely when there is a mismatch between training and test conditions either due to channel or noise. In this paper, we explore various techniques to fuse video based speaker identification with audiobased speaker identification to improve the performance under mismatched conditions. Specifically, we explore techniques to optimally determine the relative weights of the indepen dent decisions based on audio and video to achieve the best combination. Experiments on video broadcast news data suggest that significant im provements can be achieved by the combination in acoustically degraded conditions.