Unpacking Voice Technology with Dr. Shekhar Nayak

Date:

14 December 2023

Curious about the MSc Voice Technology and the academic expertise of the professors shaping the master's programme? In today's interview, we're meeting Dr. Shekhar Nayak, Assistant Professor in Speech Technology. Join us as Dr. Nayak guides us through the programme, shedding light on aspects that might still be unfamiliar to many.

Let's begin with the basics: could you please share details about your academic background and overall expertise?

My educational background is rooted in Electronics and Communications engineering. I pursued a Master’s in Signal Processing and obtained a Ph.D. in Speech Recognition from the Indian Institute of Technology (IIT) Delhi and IIT Hyderabad respectively. During my masters degree, human-machine interaction and speech communication particularly intrigued and inspired me.

Throughout my career, I have predominantly worked in the industry, holding positions such as senior chief engineer in Samsung R&D focusing on research in the field of speech technology and as a Technology Consultant in Hewlett and Packard. However, there has always been a compelling connection with academia, and a general passion for teaching has persisted. Despite this, finding a suitable programme that effectively amalgamated my expertise with my interest in deep learning and solutions for speech technology proved challenging for an extended period.

This changed when I discovered the MSc Voice Technology programme, a unique blend of advanced speech recognition, synthesis, and more. It was a programme unlike any I had encountered globally. I immediately connected with Dr. Matt Coler, the driving force behind the entire programme, and felt a strong resonance with his conceptualization.

Moving on to your role as an educator, could you share insights into the specific subjects you teach and the focal points of these courses?

I have been involved in quite a number of courses over the years.

In the course of Programming, the emphasis lies on instructing computers to perform tasks across various real-world domains. The course essentially tries to teach students the language of computers.

In the Machine Learning course, we learn about how computers can do certain tasks without the need for explicit programming. Instead, they rely on past data and experiences to continually enhance their capabilities.

In Speech Recognition, the focus is on converting speech to text, offering a unique perspective beyond traditional AI paradigms. This provides us with the ability to use speech as a medium for interacting with machines.

Moving on to Speech Synthesis, the focus shifts to transforming text-to-speech, contributing to the enhancement of computer outputs by making them more interactive and human-like.

In addition to these subjects, I am currently involved in contributing to the design of courses for one of our two bachelor’s programmes, that is Data Science & Society.

Let's go back for a moment. What initially piqued your interest in the field of voice technology?

During my Master of Technology studies (2009-2011) at IIT Delhi, I got a chance to do two amazing courses - Detection and Estimation Theory by Prof. S.D. Joshi and Human and Machine Speech Communication by Prof. Arun Kumar. These two courses together motivated me to explore signal processing specifically pertaining to speech signals. I had no anticipation that we were at the verge of a great AI revolution especially in the field of Voice Technology. I continued my PhD in Speech Tech domain under Prof. K. Sri Rama Murty at IIT Hyderabad which provided me with highly enriching experience in this field. Thus, my career is simply a fruition of the seeds of interest in the field sown by these great Professors and through teaching I try to return in any small way possible what I have received.

Shifting our focus to our programme, what distinguishes it from others and makes it appealing to students?

The programme is very unique, highly specialized and very hands-on with significant technical rigor yet welcoming to various disciplines students. Successful completion of this programme by anyone not only opens doors to the Speech Tech research and industry but can also open possibilities in allied domains such as Machine Learning and Natural Language Processing. In my view, the programme has two major values - innovation and inclusion. The students develop a very innovative mindset towards the speech technology paradigm to be able to develop new and better technologies in this domain. They also get a chance to learn to develop technologies for under-represented languages or for people who do not have proper access but really need the assistive technologies for a better quality of life.

Any additional insights you think would be helpful for prospective students?

The current surge of interest in the field of AI is creating numerous opportunities for graduates in voice technology. Many Speech Tech companies are on the advisory board of the programme and also recruit our graduates. The domains in which these companies work ranges from assistive technology (creating synthetic voices for speech impaired)" to entertainment (dubbing and gaming) and many more diverse areas.

The programme itself may seem challenging, but the teaching staff, including Matt, our programme director, are welcoming, friendly, and committed to connecting with students. This ensures that students, regardless of their background, feel supported throughout the programme.

Tags: Voice Technology, teachers

Share this Facebook Twitter LinkedIn