Praat mar Frysk (mei amper data): speech synthesis for low-resource languages with cross-lingual transfer learning

PhD ceremony:T.P. (Phat) Do, PhDWhen:June 19, 2025 Start:11:00Supervisor:M.L. (Matt) Coler, PhDCo-supervisors:dr. J.E. Dijkstra, dr. E. KlabbersWhere:Map for Campus FryslânFaculty:Campus Fryslân

Text-to-Speech (TTS) is the generation of artificial speech and is used extensively in voice assistants (such as Siri and Alexa), accessibility tools, and language learning applications. While modern TTS has obtained very good quality and naturalness in major languages like English and Chinese, many of the world’s languages are left behind. These are languages that lack the large-sized and high-quality speech data needed to train modern TTS systems. Speakers of such low-resource languages (LRLs) therefore cannot fully benefit from the advancements of TTS and of language technology in general.

One solution for developing good quality TTS for LRLs is cross-lingual transfer learning. This means first training a TTS system on a source language with abundant data (such as English) before adapting it to the LRL using its limited data. This project works toward the best practices in this approach. First, it looks into how best to select the source language. Second, it examines handling the different sound systems of different languages. Third, it explores evaluating TTS quality efficiently with less human effort. Lastly, it experiments with working around the lack of a pronunciation dictionary, which is often missing for LRLs.

The project’s experiments focused on the case study of Frisian and culminated in an open-source Frisian TTS model that everyone can use (https://phat-do.github.io/Frysk-TTS). The findings were also validated with other diverse LRLs: Bulgarian, Georgian, Kazakh, Swahili, Urdu, and Uzbek. The promising results of the project contribute to making modern and high-quality TTS more accessible to LRLs, improving the inclusivity of language technology in our increasingly digital world.

Dissertation: https://hdl.handle.net/11370/8fcc5233-35a4-45ea-80ea-ee628b4f0302

View this page in: Nederlands

Praat mar Frysk (mei amper data): speech synthesis for low-resource languages with cross-lingual transfer learning

Functional

Standard

Complete