Autonomous Evaluation and Refinement of Digital Agents
Jiayi Pan, Yichi Zhang, Nicholas Tomlin, Yifei Zhou, Sergey Levine, Alane Suhr
September, 2024Abstract
Jiayi Pan, Yichi Zhang, Nickolas Tomlin, Yifei Zhou, Sergey Levine, Alane Suhr. COLM 2024 / ⭐️ MAR Workshop @ CVPR 2024 Best Paper.
We design model-based evaluators to both evaluate and autonomously improve agents' performance. We show that these open-ended evaluators can significantly improve agents' performance, through either fine-tuning or inference-time guidance, without any extra supervision.