标题:多模式模仿学习的广义保证

摘要】由于不完善的演示或模仿学习算法无法准确推断专家的策略,因此模仿学习的控制策略通常无法推广到新颖的环境。在本文中,我们通过利用可能近似正确(PAC)-贝叶斯(Bayes)框架为模仿环境中的政策预期成本提供了上限,从而为模仿学习提供了严格的概括保证。我们提出了一种两阶段的训练方法,其中先使用条件变分自动编码器将潜在策略分布嵌入多模式专家行为,然后在新的训练环境中进行“微调”以显式优化泛化范围。我们在(i)抓取各种杯子,(ii)带有视觉反馈的平面推动和(iii)基于视觉的室内导航以及通过针对这两者的硬件实验的仿真中展示了强大的泛化界限及其相对于经验性能的紧密性操作任务。

Title: Generalization Guarantees for Multi-Modal Imitation Learning

[abstract]  Control policies from imitation learning can often fail to generalize to novel environments due to imperfect demonstrations or the inability of imitation learning algorithms to accurately infer the expert’s policies. In this paper, we present rigorous generalization guarantees for imitation learning by leveraging the Probably Approximately Correct (PAC)-Bayes framework to provide upper bounds on the expected cost of policies in novel environments. We propose a two-stage training method where a latent policy distribution is first embedded with multi-modal expert behavior using a conditional variational autoencoder, and then “fine-tuned” in new training environments to explicitly optimize the generalization bound. We demonstrate strong generalization bounds and their tightness relative to empirical performance in simulation for (i) grasping diverse mugs, (ii) planar pushing with visual feedback, and (iii) vision-based indoor navigation, as well as through hardware experiments for the two manipulation tasks.

【作者】Allen Z. Ren, Sushant Veer, Anirudha Majumdar

点击查看原文

头像

By szf