DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning
Hao Bai, Yifei Zhou, Mert Cemri, Jiayi Pan, Alane Suhr, Sergey Levine, Aviral Kumar
June, 2024Abstract
Hao Bai*, Yifei Zhou*, Mert Cemri, Jiayi Pan, Alane Suhr, Sergey Levine, Aviral Kumar. NIPS 2024.
We develop reinforcement learning techniques to post-train device-control language agents. Our 2B VLM, when post-trained with an autonomous evaluator, improves its success rate from 17% to 67% on Android device-control tasks.