Skip to Main content Skip to Navigation

Open-Ended Affordance Discovery in Robotics Using Pertinent Visual Features

Abstract : Scene understanding is a challenging problem in computer vision and robotics. It is traditionally addressed as an observation only process, in which the robot acquires data on its environment through its exteroceptive sensors, and processes it with specific algorithms (using for example Deep Neural Nets in modern approaches), to produce an interpretation: 'This is a chair because this looks like a chair'. For a robot to properly operate in its environment it needs to understand it. It needs to make sense of it in relation to its motivations and to its action capacities. We believe that scene understanding requires interaction with the environment, wherein perception, action and proprioception are integrated. The work described in this thesis explores this avenue which is inspired by work in Psychology and Neuroscience showing the strong link between action and perception. The concept of affordance has been introduced by James J. Gibson in 1977. It states that animals tend to perceive their environment through what they can accomplish with it (what it affords them), rather than solely through its intrinsic properties: 'This is a chair because I can sit on it.'. There is a variety of approaches studying affordances in robotics, largely agreeing on representing an affordance as a triplet (effect, (action, entity)), such that the effect effect is generated when action action is exerted on entity entity. However most authors use predefined features to describe the environment. We argue that building affordances on predefined features is actually defeating their purpose, by limiting them to the perceptual subspace generated by these features. Furthermore we affirm the impracticability of predefining a set of features general enough to describe entities in open-ended environments. In this thesis, we propose and develop an approach to enable a robot to learn affordances while simultaneously building relevant features describing the environment. To bootstrap affordance discovery we use a classical interaction loop. The robot executes a sequence of motor controls (action a) on a part of the environment ('object' o) described using a predefined set of initial features (color and size) and observes the result (effect e). By repeating this process, a dataset of (e, (a, o)) instances is built. This dataset is then used to train a predictive model of the affordance. To learn a new feature, the same loop is used, but instead of using a predefined set of descriptors of o we use a deep convolutional neural network (CNN). The raw data (2D images) of o is used as input and the effect e as expected output. The action is implicit as a different CNN is trained for each specific action. The training is self-supervised as the interaction data is produced by the robot itself. In order to correctly predict the affordance, the network must extract features which are directly relevant to the environment and the motor capabilities of the robot. Any feature learned by the method can then be added to the initial descriptors set. To achieve open-ended learning, whenever the agent executes the same action on two apparently similar objects (regarding a currently used set of features), but does not observe the same effect, it has to assume that it does not possess the relevant features to distinguish those objects in regard to this action, hence it needs to discover and learn these new features to reduce ambiguity. The robot will use the same approach to enrich its descriptor set. Several experiments on a real robotic setup showed that we can reach predictive performance similar to classical approaches which use predefined descriptors, while avoiding their limitation.
Complete list of metadata
Contributor : ABES STAR :  Contact
Submitted on : Wednesday, March 16, 2022 - 1:14:09 PM
Last modification on : Thursday, June 2, 2022 - 2:55:49 PM
Long-term archiving on: : Friday, June 17, 2022 - 7:09:49 PM


Version validated by the jury (STAR)


  • HAL Id : tel-03610427, version 1


Pierre Luce-Vayrac. Open-Ended Affordance Discovery in Robotics Using Pertinent Visual Features. Robotics [cs.RO]. Sorbonne Université, 2019. English. ⟨NNT : 2019SORUS670⟩. ⟨tel-03610427⟩



Record views


Files downloads