It may be late, but still want to take this opportunity to give an abstract idea about my GSoC project. First of all, I would really like to thank my mentor Peter Grasch and KDE Community for accepting my proposal “Multimodal Accessibility: Using Computer Vision to improve Speech Recognition in Simon“. Also, would like to thank Google for giving me stipend and providing me a platform for contributing to open-source. This project would also act as a starting point to my contribution to the open source community and hence to mankind.
For those who don’t know about Simon, It is open source speech recognition program which replaces the mouse and keyboard.
A major obstacle for command and control speech recognition systems is to differentiate commands from background noise. Many systems solve this by using physical buttons or certain key phrases to activate/deactivate the speech recognition. This project explores the use of computer vision to determine when to activate / deactivate the sound recognition using visual cues. For a media centre or robot applications, it would make a lot more sense to only activate the recognition when the user is actively looking at the screen/robot and is speaking something. So we are not just detecting the face, but we are also detecting whether he is speaking or not by detecting the users lip movements. This is strikingly similar to the day-to-day communication between humans! Furthermore, In the current version of simon, users have to activate/deactivate the simon manually or using voice commands. In addition to that we can perform the gestures to control the on/off states of Simon.
Before GSoC, I have implemented a part of Lip Reader to detect Lip movements to know whether a person is speaking or not.
Here are some demonstrations:
And yeah, I am going to Akademy, annual world summit of KDE.