Doctoral thesis

Self-supervised robot learning for spatial perception

  • 2023

PhD: Università della Svizzera italiana

English Nowadays, deep learning techniques are ubiquitous for robot perception tasks, thanks to their ability to recognize complex patterns and handle high-dimensional data. Crucial to the success of a robot learning approach is the amount and quality of labeled training data. Collecting a large amount of labeled training data requires much effort and resources: human experts may be employed to manually label the collected data, requiring many man-hours, or alternatively, one may use dedicated equipment capable of providing the needed labels; however, acquiring and maintaining such equipment is expensive and requires an accurate setup, especially when the system needs calibration to match the expected ground truth. One potential solution is to use a simulator, providing perfect knowledge of the state of the environment. This solution, in turn, brings its own challenges related to the reality gap, i.e., the many differences between simulated and realistic data. Robots ought to work in the real world, where gathered information is complex, imprecise, and noisy by nature, whereas simulated data is often too simplistic for training. In this dissertation, we discuss and propose novel approaches for self-supervised robot learning, where the robot autonomously collects data and uses it to supervise the training or finetuning of a deep learning model. Specifically, we focus on spatial perception tasks, which entail the robot’s ability to interpret complex visual data to estimate the geometrical properties of the environment, including the location of humans, obstacles, robots, and other relevant objects. Self-supervised robot learning is compelling because it allows the robot to collect large quantities of training data without requiring the involvement of humans; in fact, the robot may collect data in all the environments it can explore, even the one in which it will be deployed. The model is trained on the task at hand using the collected data; in addition, the model may be asked to solve an auxiliary task, named pretext, to learn better features and improve its performance. By introducing the pretext task, we limit the need for labeled data required to achieve an adequate level of performance. In the following, we describe our work in the field, from the design of approaches and their implementation, to the validation on held-out testing data and in-field experiments. Our contributions to the state of the art concern three areas within self-supervised robot learning. First, we propose novel ways to derive supervision from sensors mounted on the robot, combining multiple sensors’ readings collected in a time window. Second, we tackle some of the shortcomings in current self-supervised robot learning approaches by taking advantage of partially labeled examples and dealing with the noise affecting sensor’s readings. Third, we introduce novel self-supervised pretext tasks tailored to robotics and aimed at improving the performance of spatial perception models. Finally, we present three potential research avenues for alleviating the challenges associated with large-scale data collection and for the improvement of perception models.
  • English
License undefined
Open access status
Persistent URL

Document views: 7 File downloads:
  • 2023INF020.pdf: 5