Google summer of code 2012

Hello Everyone,

It may be late, but still want to take this opportunity to give an abstract idea about my GSoC project. First of all, I would really like to thank my mentor Peter Grasch and KDE Community for accepting my proposal “Multimodal Accessibility: Using Computer Vision to improve Speech Recognition in Simon“. Also, would like to thank Google for giving me stipend and providing me a platform for contributing to open-source. This project would also act as a starting point to my contribution to the open source community and hence to mankind.

For those who don’t know about Simon, It is open source speech recognition program which replaces the mouse and keyboard.

A major obstacle for command and control speech recognition systems is to differentiate commands from background noise. Many systems solve this by using physical buttons or certain key phrases to activate/deactivate the speech recognition. This project explores the use of computer vision to determine when to activate / deactivate the sound recognition using visual cues. For a media centre or robot applications, it would make a lot more sense to only activate the recognition when the user is actively looking at the screen/robot and is speaking something. So we are not just detecting the face, but we are also detecting whether he is speaking or not by detecting  the users lip movements. This is strikingly similar to the day-to-day communication between humans! Furthermore, In the current version of simon, users have to activate/deactivate the simon manually or using voice commands. In addition to that we can perform the gestures to control the on/off states of Simon.

Before GSoC, I have implemented a part of Lip Reader to detect Lip movements to know whether a person is speaking or not.

Here are some demonstrations:


And yeah, I am going to Akademy, annual world summit of KDE.

30th June – 6th July 12, Tallinn, Estonia Akademy 2012 Tallinn Estonia

  • Arnalda

    nice article…

  • Corina

    very nice and interesting keep posting more…

  • Pingback: Simon 0.4.0 Brings Better Speech Recognition Accuracy | LinuxNov

  • Priyanka wagh

    hi yash,
    I am doing M.E right now, and expecting to do project based on lip movement reading and converting the words spoken during that into text by using image processing, can you please help in this topic? can you please provide me some study material for this? can you please send me some research papers, journal papers regarding this topic in abt latest research in this field?

    I hope you will like to help me. thank you and all the best for your gr8 efforts in your work.

    • admin

      Hi Priyanka,
      Thank you for your kind words. Sure, i would like to help you regarding your project. I will surely mail you details about it.

      Good luck!