Skip to ContentSkip to Navigation
Research Center for Language and Cognition (CLCG) CLCG colloquium

Schedule and speakers 2017

Dates 2017 Speaker & Title of Presentation Time & Location
December 14

Elia Bruni (

Title: Visually Grounded Dialogue

November 23

Mariët Theune, Human Media Interaction (HMI) University of Twente (

Title: "R3D3: conversations with a virtual human / robot duo"

October 19

Prof. Dirk Hovy, University of Copenhagen

Title: NLP, the perfect social (media) science?


"R3D3: conversations with a virtual human / robot duo"

Mariët Theune, Human Media Interaction (HMI) University of Twente

Much research has been done in recent years on interaction with either virtual humans or social robots in various types of applications. However, applications that feature both a virtual human and a social robot are not common. The Rolling Receptionist Robot with Double Dutch Dialogue (R3D3) is an exception. It consists of two agents: a robot and a virtual human, carried by the robot on a tablet. The virtual human is capable of holding simple spoken conversations in Dutch. The robot does not speak, but can make head gestures and use affective gaze.

In this talk I will present R3D3 and discuss some experiments we carried out with it, both in controlled conditions and in a field study with groups of children in the NEMO Science Museum. In the experiments we investigated the role of the non-speaking robot in the conversations with R3D3. Specifically, we experimented with using the robot for turn managements in multi-party conversations with R3D3.

Turn-taking is seen as an important factor in managing fluent conversations. A key turn-taking behaviour in conversations between humans is the intentional direction of gaze. Gaze has also been shown to be a highly effective mechanism for turn-managament in human-robot interaction, especially when interacting with multiple people. Our findings suggest that the robot's gaze is a powerful social signal for R3D3 as well.

"NLP, the perfect social (media) science?"

Prof. Dirk Hovy, University of Copenhagen

Language is the ultimate social medium: We don't just communicate to
convey information, but also to entertain, to gossip, to console, and
much more. Social media is one of the purest expressions of all of
these aspects of language, and often includes additional information
about the place, time, and author of a message.

This combination has allowed NLP to work on real, situated, individual
language, rather than on abstract general corpora, and lead it into
areas that were previously the sole domain of social sciences. These
areas open up a wide range of exciting new applications, but also
presents a host of new challenges - technically, linguistically, and

In this talk, I will illustrate both opportunities and challenges with
some of my ongoing research, and end with a number of open questions
that I believe will guide NLP for the years to come.

Dirk Hovy is associate professor at the University of Copenhagen. His
research focuses on computational sociolinguistics, the interaction of
NLP, demographic factors, and language, and its consequences for
performance, fairness, and personalization of statistical models. He
is also interested in ethical questions of bias and algorithmic
fairness in ML in general, and recently co-organized the EACL-workshop
Ethics in NLP.

Dirk holds a PhD in NLP from the University of Southern California,
and a Magister in sociolinguistics from the University of Marburg,

Visually Grounded Dialogue

Elia Bruni (

Combining information from language and vision has recently received a
lot of attention in AI. From one side, the computer vision community
is exploiting NLP methods in order to deepen image understanding
(think about image captioning or image generation from text
descriptions).  On the other side, the NLP community has understood
the importance of the visual channel to ground computational models of
language into the visual world. For example, in the construction of
visually grounded semantic representations. Despite such progress,
these models instantiate rather fragile connections between vision and
language, and we are still far from truly grasping the linkage between
these two modalities.  One of the reasons for this is that these
systems are mainly devised to learn from very static environments,
where a single agent is repeatedly exposed to annotated examples and
learns by trying to reproduce the annotations or images (by plain
supervised learning).
In this talk, I will introduce a multimodal learning framework where
two agents will have to cooperate via language in order to achieve a
goal that is grounded in an external visual world. More specifically,
I will talk about two alternative tasks that upgrade multimodal
learning to dialogue interactions (Visual Dialogue and GuessWhat!?),
and introduce two of our current contributions to this system: a new
module within multimodal dialogue which is acting as a dialogue
manager, and a new multimodal dialogue task specifically designed to
capture a linguistic phenomenon called partner specificity.

Laatst gewijzigd:03 juni 2019 12:04