Skip to ContentSkip to Navigation
Faculty of ArtsOur facultyFaculty of Arts & SocietyOur contribution to the SDGs

Between man and machine

Naturally, the computer is no longer conceivable from our communicative society. We not only use him as a means for communicating with other people, but we also communicate with computers themselves. Of course, this does not always work. One can laugh about funny translations on Google Translate and be frustrated by the voice recognition that misinterprets one’s home location time after time. However,communication between man and machine gets better and better and the possibilities increase. This makes the field of computational linguistics so interesting and important.

Computational linguistics is concerned with how computers process natural (i.e. human)language.Within CLCG, computational linguistsdo research at the interface between linguistics and computer science and deal with theoretical, experimental and applied questions. Here, we present two examples that show how our researchers work on socially relevant projects.

What can computers learn from human translators?
What can computers learn from human translators?

Translation is of great significance

Translation is a difficult task, as Professor of Computational Semantics Johan Bos knows well by now. There is a lot more to do than literally translate word for word and take into account different word sequences of languages. For example, if one literally translates “having a chat'' into “een praatje hebben” (having idle talk) the meaning has changed significantly.Therefore we do not find this a good translation. But what exactly makes a good translation and how do you learn a computer to deal with such subtle differences? Bos' NWO VICI project "Lost in translation, found in meaning" (2015-2020) focuses on all aspects of translations including the role of meaning.

His group of six researchers builds, amongother things, The Parallel Meaning Bank, a large database of nearly a million English sentences, translated by people into Dutch, German and Italian. Incorporating more languages offers new opportunities. So far, the analysis of meaning in language produced by computers has mainly focused on English, but precisely the differences between language scan provide more insight into the process and free the way for all kinds of applications worldwide.

In the Parallel Meaning Bank, the computer generates the meaning of each individual sentence through linguistic analysis. After that, one can search for differences between translations through an automated analysis. One would expect that translated sentences will have exactly the same meaning, but human translators often seem to be much more creative when translating sentences. Sometimes they even leave things out or add other information, small changes that seem to take a translation further from its source but actually sound more natural. For example, have a look at this English sentence: "He removed the dishes from the table."A literal translation in Dutch provides the grammatical sentence “Hij verwijderde het vaatwerk van de tafel. ”But no Dutch person would say that. Certainly you would say “Hij ruimde de tafel af.” (He cleared the table). “The dishes” have suddenly disappeared; obviously the meaning is included in the term “afruimen” (to clear away/to remove)!

Interesting issues for which a self-learning computer system needs to make a lot of calculations, and therefore can definitely use a little help from humans.There is still a long way to go, but through this project we hope to understand what human translators are doing exactly and how to ultimately improve computer translations.

Gosse Bouma (portrait made by students in Linguistics)
Gosse Bouma (portrait made by students in Linguistics)

Searching for good answers

Another form of communication between man and machine takes place in question and answer sessions. For example, you ask your personal assistant on your phone to find out to which club a particular soccer player belongs. The system searches through large text files on the web for combinations of footballers and clubs, counts the most frequently occurring combinations and thereby quickly finds the most likely answer. Gosse Bouma was already involved with this kind of applications years ago. It started with relatively simple questions, such as the one above, and with ways to do manual searches for the correct coherence relations in large text files. Of course, as a linguist Gosse Bouma was interested in how grammar could help to deal with more difficult questions and to reduce the number of incorrect answers.

Meanwhile, the development of search methods continued and it became easier to look for patterns. Thanks to the NWO KIEM project “Direct ter zake” (Directly on the issue)(2015), Bouma could deepen his knowledge in this field. In KIEM projects, NWO stimulates collaboration between researchers and the creative industry in search of new knowledge and applications. In this case, the startup Bert Alkemade Creative Interaction wanted to know how computer search systems could help professionals to supply them with relevant information in an automated way. This practical application was the main aim of the research project and immediately showed what the most difficult but also the most important issues were the researchers had to address. How do you deal with complex concepts in official texts? What is an acceptable error rate for users?

The collaboration was very satisfying.In the meantime, Bouma and Sokrates Technologies have started a next KIEM project. Socrates Technologies would like to develop a software application that enables aid organizations to quickly find relevant information in the UN database Reliefweb.This is an important source of information about crises in the world and it is of course of vital importance that aid organizations quickly get all relevant information in an Ebola outbreak or flood. This time, the partners will use deep learning, a method that enables the system to train itself.

Last modified:13 June 2019 1.55 p.m.
printView this page in: Nederlands