Learning Representations that Enable Generalization in Assistive Tasks

Learning Representations that
Enable Generalization in Assistive Tasks

Jerry Zhi-Yang He, Zackory Erickson, Daniel S. Brown, Aditi Raghunathan, and Anca D. Dragan

CORL 2022 Paper Code (Soon) BibTex

PALM - Prediction-Based Assistive Latent eMbedding

Jointly learning the Latent Embedding and the robot policy. The resulting latent space captures
the underlying structure of the preferences and strategies of the training humans.

Summary

In this work, we identify two principles as key to enabling better generalization. First is that we benefit from learning a latent space of partners that distills their policies down to a structure that is useful for the robot’s policy and that makes it easy to identify partners at test time. Second is that we need to be prepared for this space to not perfectly capture the space of real human policies, and design it so that it is adaptable with real test-time interaction data.

Learning assistive latent space with action prediction loss

Given interaction history of N robot observation and human action pairs, we embed this trajectory to a low-dimensional manifold and use it to predict the next human action. The intuition is that if we are able to predict it, we extract the sufficient information about the human's policy. Note that this does not require any supervision or prior information about the human's preferences, and can be trained with unsupervised learning.

We recover the latent space of a toy environment in an unsupervised fashion
The kl coefficient to regularizes it to different degrees.

Test-time optimization

At test time, as we work with a new user, we would like our encoding of the new user to match the true latent information of that user. In order to achieve that, we can optimize for the same action prediction objective for a few steps, which we refer to as test time adaptation

Assistive Tasks

We evaluate our method in assistive itch scratching task, where to goal is to scratch an unknown location. We generate synthetic humans by varying the itch locations and the amount of action penalties. We find that PALM is able to achive better performance on OOD humans.

We further investigate a prior method (RMA) and find that it fails to infer the underlying structure of the two arms and correctly embed OOD humans (red), whereas PALM successfully infers this latent structure.

Learning Representations that Enable Generalization in Assistive Tasks

Jerry Zhi-Yang He, Zackory Erickson, Daniel S. Brown, Aditi Raghunathan, and Anca D. Dragan

CORL 2022 Paper Code (Soon) BibTex