Scalable visual target navigation with foundation models

PhD ceremony:B. YuWhen:May 08, 2026 Start:11:00Supervisors:M. (Ming) Cao, Prof, prof. dr. ir. J.M.A. (Jacquelien) ScherpenCo-supervisor:S.H. (Hamidreza) Mohades Kasaei, PhDWhere:Academy building UGFaculty:Science and Engineering

Autonomous robots are becoming increasingly capable of operating in indoor environments, but reliably finding a specific target in an unfamiliar space remains difficult. A robot that is asked to find an object, such as a laptop, a document, or a piece of equipment, must interpret what it sees, decide where to explore, and adapt when the environment is only partially known. In his thesis, Bangguo Yu studies visual target navigation, a problem at the intersection of robot perception, mapping, reasoning, and decision-making.

Yu develops a modular navigation framework that progresses from single-robot search to multi-robot cooperation. He first shows how reinforcement learning can improve exploration by combining semantic maps with frontier-based search. Yu then demonstrates that large language models can provide useful commonsense knowledge for object search without requiring costly task-specific training. Next, Yu extends navigation from simple object categories to richer natural-language descriptions, enabling robots to search for targets described by attributes or spatial relations. Yu also introduces a cooperative multi-robot setting in which several robots share information and divide exploration more effectively.

Finally, Yu addresses privacy-aware navigation, allowing robots to choose routes that reduce unnecessary exposure in sensitive or crowded environments. Together, the results show that combining mapping, language-based reasoning, vision-language models, and robot cooperation can make autonomous navigation more efficient, more flexible, and better aligned with real-world requirements.

Dissertation: https://hdl.handle.net/11370/94eab007-48a3-469e-a957-62b67cf53093

View this page in: Nederlands

Scalable visual target navigation with foundation models

Functional

Standard

Complete