Autonomous Evaluation and Refinement of Digital Agents

Featured

Abstract

Jiayi Pan, Yichi Zhang, Nickolas Tomlin, Yifei Zhou, Sergey Levine, Alane Suhr. COLM 2024 / ⭐️ MAR Workshop @ CVPR 2024 Best Paper.

We design model-based evaluators to both evaluate and autonomously improve agents' performance. We show that these open-ended evaluators can significantly improve agents' performance, through either fine-tuning or inference-time guidance, without any extra supervision.

Publication
COLM 2024
Jiayi Pan
Jiayi Pan
潘家怡