Collecting under resourced languages - MSc Voice Technology

Date:10 November 2022

Dragoș Bălan, MSc Voice Technology student from Romania, recorded over 2000 speech samples and validated another 1500 in his native language and won the Voice Technology - Common Voice competition and a Google Nest Mini sponsored by the Province of Fryslân in collaboration with Mozilla Fryslân. We spoke with Dragos about this achievement and the role of his studies.

Why did you decide to study MSc Voice Technology?

I came to the Netherlands three years ago to study Computer Science and Engineering at the TU Delft in which I took the multimedia track, which deals with images, signals, voices and sounds. This field really interested me, and this Master programme was a great way to continue in that field and dive deeper into it.

What was the Common Voice competition about?

The goal of the competition was to contribute to under resourced languages, which is a language that has not enough speech recordings available to develop any system based on voice technology. An example is Vietnamese which has over 20 million people speaking it but is still considered an under resourced language because there are too little recordings of people speaking Vietnamese to develop applications that have a good accuracy in understanding what you’re saying or develop text to speech applications that can speak Vietnamese.

You did this by making an account on Common Voice, the website of Mozilla, and then start recording yourself saying and submitting sentences or validating recordings by other people to see whether they pronounced it correctly or repeated themselves. The goal of the Common Voice is to build an open source, voice dataset for people to use. The focus is on research, but it is open-source so it is for everyone to use.

How did you contribute to the Common Voice competition and what was your motivation for doing so?

My own language, Romanian, is very under-resourced, so I was interested to improve this. And of course the prize provided by the Province of Fryslân, a Google Nest Mini, was also a good motivation [laughs]. Those were my two main motivations.

How did you go about recording over 2000 speech samples and validating 1500 speech samples in your native language?

I set myself a personal goal of at least 50 recordings and 50 validations in a day, but on some days I also had more. Especially recording them became sort of game, where new sentences kept coming and you wanted to keep going. This is also why I had a bit more speech samples than validations in the end. Since each recording takes about 10 seconds, if you record for 30 minutes you already have 150 samples.

So, in total you recorded almost 6 hours of speech samples, how does that relate to the total amount of speech samples in Romanian within the Common Voice project?

Currently there are around 35 or 36 hours of data, which may sound like quite a lot but for most applications you would need at least 100 to 150 hours of data to build a system in a certain language. But I’m of course not the only one contributing. Although, if I can brag a little, I am currently in the 2nd place for total amount of recordings of Romanian speakers within the Common Voice project.

What tip would you give others who want to contribute to the Common Voice programme?

Just simply go to the Mozilla Fryslân website and start recording. It is as simple as that. Or start validating of course, that is also as important.

Tags: Voice Technology, students

Share this Facebook LinkedIn