‘Hey Google! Tell me a joke.’ Speech technology is increasingly becoming part of our everyday life. For many people, it is no more than a fun gimmick built into a telephone or a virtual assistant. But the underlying technology can be used for very serious applications, for example in the medical world or in preserving languages and cultures, explains Matt Coler.
Text: Bart Talens/Industry Relations, UG
Matt Coler is an associate professor in Language and Technology at the University of Groningen Campus Fryslân. Together with his research group, he works in the field of culture, language and technology, studying various aspects of speech technology. Speech technology is defined as the use of artificial intelligence to automatically translate speech to text or written text to spoken text, or to automatically recognize specific speech or voice features. Coler is supervising PhD students Vass Verkhodanova and Phat Do, who are conducting research on this topic.
Verkhodanova investigates how neurologists can recognize neurodegenerative diseases, such as Parkinson’s disease, based on people’s speech and voice. For example, some doctors are very good at recognizing dysarthria, an articulatory disorder. Without running any tests, they can recognize the disorder immediately when they hear a patient speak. Coler compares this to standing in a busy metro station in a Spanish-speaking country. If you hear someone speak Dutch in such a setting, your brain will easily filter it out and the sound will reach you, even though it is probably not very salient acoustically. This is due to the fact that the brain picks out certain signals from the sound, and places extra emphasis on this information.
Verkhodanova investigates how such signals in voice and speech use develop in different disease profiles, and how you can use artificial intelligence to automate recognition. In the next research phase, doctors could, for instance, monitor the development of a disease in a patient via telephone conversations. This would make care easier and more accessible for many people. Someone with a higher risk of developing a specific neurodegenerative disease could then be automatically screened via a short telephone conversation.
Phat Do, Coler’s other PhD student, is investigating speech synthesis and is working on creating a computer voice for the Frisian language. To develop a computer voice, the program has to integrate large amounts of speech data from the relevant language. Some smaller languages only have a few small speech corpora, and it is not always possible to collect more data. This is why in training his Frisian computer voice, Do uses both a Frisian data set and additional speech data from other languages, with a focus on ‘imitating’ a natural melody and sound. Thanks to this exceptional technology, Do is developing an artificial ‘voice’ that not only speaks Frisian, but also sounds very natural.
Campus Fryslân researchers take an interdisciplinary approach, in which collaboration between researchers, the corporate sector and societal partners plays a central role. Coler’s research group, with their application-driven focus, is certainly no exception. Coler: ‘If you wonder about the usefulness of these kinds of applications, it is good to realize that we are not studying gadgets here. This technology has very important real-life applications: think of patients with throat cancer, who can no longer speak in a healthy way or who suffer from language production disorders. A naturally sounding artificial voice can greatly improve these patients’ quality of life.’
Coler explains that big tech companies also work a lot with artificial speech. Think of Apple’s Siri or Amazon’s Alexa. ‘These companies work on the basis of a revenue model. They are not likely to include dialects or minority languages in their technology, simply because there is no market for it. However, this does not mean that there is no need for it.’ He continues: ‘This is where universities have a responsibility to work between the market and society, to continue to develop these technologies and prepare them for various uses. Instead of competing with big companies on basic speech technology applications, our research group tries to diversify the field by focusing on applications that improve quality of life.’
The development of speech technology therefore has a great potential for societal impact and relevance – a potential that is not yet fully realized. Speech technology applications can even play a role in lawsuits. For example, the main suspect in a lawsuit surrounding the death of American teenager Trayvon Martin was acquitted due, among other things, to his claim that he could be heard shouting for help in a phone conversation with the alarm centre. Many people doubted whether the voice in question really belonged to the suspect. In such cases, voice recognition could play a crucial and decisive role in jurisdiction. In addition, the medical world has a great need for applications for patients. And speech synthesis can contribute to protecting and preserving endangered languages.
The Young Academy Groningen welcomes seven new members from diverse disciplines from the University of Groningen.
Sustainable entrepreneurs have the potential to solve some major problems. A new company can change the market and influence our way of life. But how do you transform such a problem into a business opportunity? Margo Enthoven will be awarded a PhD...
Former elite speed skater Beorn Nijenhuis is now a neuroscientist. He is currently conducting PhD research at Campus Fryslân into the mysterious phenomenon of skater’s cramp – the nightmare of every top-class speed skater.
The UG website uses functional and anonymous analytics cookies. Please answer the question of whether or not you want to accept other cookies (such as tracking cookies).
If no choice is made, only basic cookies will be stored. More information