Play it by Ear: Learning Skills amidst Occlusion through Audio-Visual Imitation Learning

Published in Robotics: Science & Systems (RSS) 2022, 2022

Recommended citation: Maximilian Du*, Olivia Y. Lee*, Suraj Nair, Chelsea Finn. (2010). "Play it by Ear: Learning Skills amidst Occlusion through Audio-Visual Imitation Learning" Robotics: Science & Systems 2022. https://arxiv.org/pdf/2205.14850.pdf

Our proposed system learns a set of challenging partially-observed manipulation tasks from visual and audio inputs by combining offline imitation learning from a modest number of tele-operated demonstrations and online finetuning using human provided interventions. In simulation, our system benefits from using audio and using online interventions, we are able to improve the success rate of offline imitation learning by ~20%. On a Franka Emika Panda robot, our system completes manipulation tasks (e.g. extracting keys from a bag) with a 70% success rate, 50% higher than a policy that does not use audio. Download paper here