Summary
In this work, we identify two principles as key to enabling better generalization. First is that we
benefit from learning a latent space of partners that distills their policies down to a structure that is
useful for the robot’s policy and that makes it easy to identify partners at test time. Second is that
we need to be prepared for this space to not perfectly capture the space of real human policies, and
design it so that it is adaptable with real test-time interaction data.