It is less than a week now. I’m very excited to attend Akademy 2013, the annual KDE Conference, this year for the second time.
This year, it will be hosted in Bilbao, Basque country of Spain. Along with +Devaja Shah, i will be talking on “A Revolution in itself”. It will focus on KDE India and will highlight KDE Meetup 2013, held in India, the successes, the obstacles, the entire journey and the outcomes and also how it influenced and impacted the lives of many such young students who were inspired to contribute and to bring about a revolution by contributing to open source and KDE. It will be an indication and motivation for organizing many more such events in future. If you are planning to attend, do not forget to register yourself here.
I will like to thanks KDE e.V. for sponsoring my travel and giving me opportunity to interact and share thoughts with awesome community people which will be life changing experience again!
After years of hard work by Simon team, Simon 0.4.0 is out which is the major release after a long time.
This new version of the open source speech recognition system Simon features a whole new recognition layer, context-awareness for improved accuracy and performance, a dialog system able to hold whole conversations with the user and more.
You can read about the new release in detail here.
It feels amazing when something you have implemented during Google Summer of code 2012 is part of the release. Also, this was the first time i was part of any release
I know, it is very late for posting about my experience during GSoC 2012 but i do not want to miss this opportunity to share it with you all before this year ends.
GSoC 2012 package
Let me brief you again with my project, it is based on Multimodal Accessibility in which i am using Computer Vision to improve Speech Recognition in Simon. As the major obstacle for command and control speech recognition systems was to differentiate commands from background noise, so in my project i am using the computer vision to determines when to activate / deactivate the sound recognition using visual cues like when the user is actively looking at the screen/robot and is speaking something.
Why i picked up this project in particular?
I came to know about GSoC from our seniors and the learning experiences they got out of it really motivated me to get into this. So i started searching for projects before the organisation list was even announced. Fortunately, i saw this project idea and i really liked the idea. Also, I was working on face detection lately. I also knew from seniors that the KDE is awesome community and Computer vision is one of my favourite fields. I love to work on something which replaces the normal way of using computers and which replaces physical mouse/keyboard. This all factors together droved me completely towards this project and kept me motivated. So i contacted Peter and we discussed about the ideas and it was the first time i stepped onto IRC Then i kept discussing about it on the irc and the mailing list and that’s how it all started
Experience
It has been an incredible learning experience and I’m very happy of the final results. I am more positive and confident than ever. I learnt a lot of basic stuffs like git, makefiles, building, executing and debugging code and much more. This project also acted as a starting point to my contribution to the open source community. It was my first “serious” project with such a big codebase but Simon is nicely documented and with peter’s proper guidance, i was able to adjust very soon.
I also faced many challenges during the period. It was really tough to adhere to the timeline. CMake build system and git was very much new to me. There were many unexpected bugs which surprised me a lot but then it was so much fun figuring it out and fixing it. Also working on UI was time-consuming.
KDE also invited and sponsored me to the Tallinn, Estonia for Akademy 2012 which is their annual conference. It was my first international journey. It gave me opportunity to meet people in real whom i just knew from the irc nick. The opportunity to interact and share thoughts with highly intelligent and experienced minds was a life changing experience, and the biggest takeaway, which would not have been possible without the support of Google, KDE and my mentor.
I also participated in Randa meeting 2012 in Switzerland as a part of KDE Accessibility team. It was my first sprint ever and was really very productive. I implemented vision configuration and solved many bugs there. I would again like to thanks Mario Fux for organizing this fantastic event and all sponsors and donors who made it possible.
End Result
Peter has recorded great video on Context awareness which covers most of the things i have implemented during GSoC 2012.
As you can see clearly in the video, that Simon has turned into multimodel speech recognition system. Simon will deactivate the input devices in absence of the user. This is strikingly similar to the day-to-day communication between humans!
Acknowledgement:
I owe a big part of success to my mentor Peter Grasch for always being there to answer my questions, offer advice and review the code. I have learnt a lot from him and I am sure I have improved a lot as a programmer. The best thing about working with him was that he never really disclosed the solution, instead he gently guided towards the direction of the solution, so I never lost a learning opportunity
And thanks to lots of other people in the community as well whose names I am forgetting. While there I would like to thank my friends keeping up with me when I slept during the day and worked at night.
And more than anything else, I am very happy to make my parents proud after so many years of constant hard work they have put and sacrifices they have made to to chase my dream of becoming a computer engineer. I hope it’s the first of many more proud moments that I will be giving to them in the future.
What’s Next?
I would like to maintain this project after GSoC and continue contributing myself to Simon/KDE. So stay tuned, there is much more to come
To Future GSoC Aspirants:
I would suggest maintaining good communication links with the comunity and trying to be involved with the project as much as possible.
Peter adviced me to first draw the class diagrams before starting my project and it really helped me in the future. I know we all have habit to directly start with coding but i would highly recommend you to have proper structure diagram ready before starting to code, it will give you clear idea about the implementation.
Try to budget a lot of extra time in your project application – most of us are not experienced developers and cannot estimate the amount of work needed for something correctly. Plus, when some additional problems arise (and they will), it’s always better to have time set aside to deal with them. I would highly recommend you to discuss this with your mentor before submitting your proposal.
Finally, Nothing is too hard to accomplish if you love what you do.
Hurray! I am very happy and exited that finally i am going to Tallinn, Estonia to attend my first Akademy ever. Finally after a long wait and huge efforts, I got my Visa few hours back. It will be really awesome meeting KDE family whom i knew only through IRC.
First of all, I would really like to thanks KDE eV for providing me the sponsorship for travel and accommodation.
For those who don’t know about Akademy, It is the annual world summit of KDE which is one of the largest Free Software communities in the world. It is a free, non-commercial event organized by the KDE Community. This year marks 15 years of KDE AND the 10th edition of the KDE Community Summit. Akademy features a 2-day conference with presentations on the latest KDE developments. It also will be followed by 5 days of workshops, birds of a feather (BoF) and coding sessions where participants meet, discuss, work on projects and launch new initiatives.
It may be late, but still want to take this opportunity to give an abstract idea about my GSoC project. First of all, I would really like to thank my mentor Peter Grasch and KDE Community for accepting my proposal “Multimodal Accessibility: Using Computer Vision to improve Speech Recognition in Simon“. Also, would like to thank Google for giving me stipend and providing me a platform for contributing to open-source. This project would also act as a starting point to my contribution to the open source community and hence to mankind.
For those who don’t know about Simon, It is open source speech recognition program which replaces the mouse and keyboard.
A major obstacle for command and control speech recognition systems is to differentiate commands from background noise. Many systems solve this by using physical buttons or certain key phrases to activate/deactivate the speech recognition. This project explores the use of computer vision to determine when to activate / deactivate the sound recognition using visual cues. For a media centre or robot applications, it would make a lot more sense to only activate the recognition when the user is actively looking at the screen/robot and is speaking something. So we are not just detecting the face, but we are also detecting whether he is speaking or not by detecting the users lip movements. This is strikingly similar to the day-to-day communication between humans! Furthermore, In the current version of simon, users have to activate/deactivate the simon manually or using voice commands. In addition to that we can perform the gestures to control the on/off states of Simon.
Before GSoC, I have implemented a part of Lip Reader to detect Lip movements to know whether a person is speaking or not.
Here are some demonstrations:
And yeah, I am going to Akademy, annual world summit of KDE.