Towards Clear Evaluation of Robotic Visual Semantic Navigation

Gutiérrez-Álvarez, Carlos; Hernández-García, Sergio; Nasri, Nadia; Cuesta-Infante, Alfredo; López-Sastre, Roberto J.

Towards Clear Evaluation of Robotic Visual Semantic Navigation

dc.contributor.author	Gutiérrez-Álvarez, Carlos
dc.contributor.author	Hernández-García, Sergio
dc.contributor.author	Nasri, Nadia
dc.contributor.author	Cuesta-Infante, Alfredo
dc.contributor.author	López-Sastre, Roberto J.
dc.date.accessioned	2025-01-30T13:01:38Z
dc.date.available	2025-01-30T13:01:38Z
dc.date.issued	2023-05-01
dc.description	Paper published at 2023 9th International Conference on Automation, Robotics and Applications (ICARA)
dc.description.abstract	In this paper we address the problem of visual semantic navigation (VSN), in which a robot needs to navigate through an environment to reach an object having only access to egocentric RGB perception sensors. This is a recently explored problem, where most of the approaches leverage last advances in deep learning models for visual perception, combined with reinforcement learning (RL) strategies. Nonetheless, after a review of the literature, it is complicated to perform direct comparisons between the different solutions. The main difficulties lie in the fact that the navigation environments in which the experimental metrics are reported are not accessible, and each approach uses different RL libraries. In this paper, we release a publicly available experimental setup for the VSN problem, with the aim of providing a clear benchmark. It has been constructed using pyRIL, an open source python library for RL, and two navigation environments: Miniwolrd-Maze from gym-miniworld, and one 3D scene from HM3D dataset using AI Habitat simulator. We finally propose a state-of-the-art VSN model, consisting in a Contrastive Language Image Pretraining (CLIP) visual encoder plus a set of two recurrent neural networks for producing the discrete navigation actions. This model is evaluated in the proposed experimental setup, with a careful analysis of the main VSN challenges, namely: the sparse rewards problem; and the exploitation-exploration trade-off. Code is available at: https://github.com/gramuah/vsn.
dc.identifier.citation	C. Gutiérrez-Álvarez, S. Hernández-García, N. Nasri, A. Cuesta-Infante and R. J. López-Sastre, "Towards Clear Evaluation of Robotic Visual Semantic Navigation," 2023 9th International Conference on Automation, Robotics and Applications (ICARA), Abu Dhabi, United Arab Emirates, 2023, pp. 340-345, doi: 10.1109/ICARA56516.2023.10125866. keywords: {Visualization;Three-dimensional displays;Navigation;Semantics;Stochastic processes;Robot sensing systems;Libraries;navigation;reinforcement learning;robot;deep learning},
dc.identifier.doi	10.1109/ICARA56516.2023.10125866
dc.identifier.isbn	978-1-6654-8921-8
dc.identifier.issn	2767-7745
dc.identifier.uri	https://hdl.handle.net/10115/71698
dc.language.iso	en
dc.publisher	Institute of Electrical and Electronics Engineers
dc.rights.accessRights	info:eu-repo/semantics/closedAccess
dc.subject	Deep learning
dc.subject	Learning models
dc.subject	Learning strategy
dc.subject	Learning systems
dc.subject	Navigation
dc.subject	Navigation environment
dc.subject	Navigation problem
dc.subject	Open-source
dc.subject	Recurrent neural networks
dc.subject	Reinforcement learning
dc.subject	Robot
dc.subject	Robot vision
dc.subject	Semantic navigation
dc.subject	Semantics
dc.subject	Visual perception
dc.subject	Visual semantics
dc.subject	Visual languages
dc.title	Towards Clear Evaluation of Robotic Visual Semantic Navigation
dc.type	Article

Archivos

Bloque original

Mostrando 1 - 1 de 1

Nombre:: icara2023-AM.pdf
Tamaño:: 4.31 MB
Formato:: Adobe Portable Document Format
Descripción:: In this paper we address the problem of visual semantic navigation (VSN), in which a robot needs to navigate through an environment to reach an object having only access to egocentric RGB perception sensors. This is a recently explored problem, where most of the approaches leverage last advances in deep learning models for visual perception, combined with reinforcement learning (RL) strategies. Nonetheless, after a review of the literature, it is complicated to perform direct comparisons between the different solutions. The main difficulties lie in the fact that the navigation environments in which the experimental metrics are reported are not accessible, and each approach uses different RL libraries. In this paper, we release a publicly available experimental setup for the VSN problem, with the aim of providing a clear benchmark. It has been constructed using pyRIL, an open source python library for RL, and two navigation environments: Miniwolrd-Maze from gym-miniworld, and one 3D scene from HM3D dataset using AI Habitat simulator. We finally propose a state-of-the-art VSN model, consisting in a Contrastive Language Image Pretraining (CLIP) visual encoder plus a set of two recurrent neural networks for producing the discrete navigation actions. This model is evaluated in the proposed experimental setup, with a careful analysis of the main VSN challenges, namely: the sparse rewards problem; and the exploitation-exploration trade-off. Code is available at: https://github.com/gramuah/vsn.

Descargar

Colecciones

Comunicaciones a Congresos