CLCG colloquium: Elia Bruni

Wanneer:do 14-12-2017 12:00 - 12:45
Waar:Harmonie building room 1313.0338

Elia Bruni (
Visually Grounded Dialogue

Combining information from language and vision has recently received a
lot of attention in AI. From one side, the computer vision community
is exploiting NLP methods in order to deepen image understanding
(think about image captioning or image generation from text
descriptions).  On the other side, the NLP community has understood
the importance of the visual channel to ground computational models of
language into the visual world. For example, in the construction of
visually grounded semantic representations. Despite such progress,
these models instantiate rather fragile connections between vision and
language, and we are still far from truly grasping the linkage between
these two modalities.  One of the reasons for this is that these
systems are mainly devised to learn from very static environments,
where a single agent is repeatedly exposed to annotated examples and
learns by trying to reproduce the annotations or images (by plain
supervised learning).
In this talk, I will introduce a multimodal learning framework where
two agents will have to cooperate via language in order to achieve a
goal that is grounded in an external visual world. More specifically,
I will talk about two alternative tasks that upgrade multimodal
learning to dialogue interactions (Visual Dialogue and GuessWhat!?),
and introduce two of our current contributions to this system: a new
module within multimodal dialogue which is acting as a dialogue
manager, and a new multimodal dialogue task specifically designed to
capture a linguistic phenomenon called partner specificity.

