Reading Between the Lines - Artificial Intelligence in Linguistics
|13 oktober 2020
|Team Industry Relations
Every day billions of words are written, people write short texts, long texts, formal texts and informal gibberish. Language is a big part of how we communicate what we think, how we feel and who we are. To every text, there is more meaning than just the words itself. How to make sense of it all? Professor Malvina Nissim from the Center for Language and Cognition (CLCG) of the University of Groningen (UG) researches the applications of artificial intelligence (AI) in linguistics. She is specifically interested in author profiling - finding out about characteristics of an author by analyzing the text.
Application of AI in Linguistics
When thinking of AI in linguistics, machine translation probably comes to the mind of many first. While results of available translation services were questionable a few years ago, they produce impressive results today. Natural language processing (NLP) is another big field in AI research that develops systems to translate natural human language to computer language, so that Siri, Cortona & Co. understand us. However, there are many more approaches. The progress linguists and computer scientists make in all those domains are spectacular, but language is still far from being completely figured out, says Nissim. She stresses that language is very complex: “when people sometimes say AI will take over, I am not afraid in a way”. There are many unsolved mysteries in language, like irony, sarcasm or implicit knowledge.
Momentum of AI research at the UG, mapping a few projects
In her own research, Nissim is specifically interested in author profiling. Author profiling is the method to tell character traits from a written text, such as gender or age. You might wonder, why does it matter if a woman or a man wrote a specific text? Nissim thought similarly when she first heard about the topic. After she saw a very stereotypical word cloud comparing male and females, she wanted to look deeper into the topic and see if there is more than only the stereotypical. The word clouds she saw put females into clusters with shopping, make-up, horseback riding and males with soccer and beer.
However, she discovered there is much more than the stereotypical: male persons, for example, say “I am” more often than females. Females on the other hand, use the expression “to say” more often. This is only one small example of separate expressions different genders tend to use. These patterns remain invisible to humans, but identifiable by algorithms at scale. Nissim says that machines have a different way to look at words, they are able to pay attention to very small details. Overall, the accuracy for author profiling is quite similar between machines and humans at 75%-80%. However, they identify very different examples. Nissim found out that humans fall for stereotypes to a larger extent and machines focus more on the hidden patterns. Not only has this potential for the translation and advertising industry, it also helps linguists to understand language better. For Nissim, that is very important. “This is not a secondary point,” she says.
Real life/Industry applications
The applications of the research of Nissim’s group are countless. They collaborate with the Netherlands forensic institute on authorship attribution. Cutting out letters from a newspaper is not enough to protect your anonymity. The forensic institute and computational linguists from the University of Groningen work on techniques identifying if two texts are written by the same person, just looking at the words and style being used. Not all applications in author attribution are such detective work, however. In literature, there are authors writing with pseudonyms - The Italian bestseller novelist Elena Ferrante does not exist. It is a pseudonym and many researchers are trying to find out who the person behind the pseudonym is, by applying author attribution techniques.
Nissim notes, if you want to analyze language, you really need to go deep. Staying at the surface only finds stereotypes and those are not very insightful. The field of computational linguistics is as versatile as the languages we speak.