Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning
Yuexiang Zhai, Hao Bai, Zipeng Lin, Jiayi Pan, Shengbang Tong, Yifei Zhou, Alane Suhr, Saining Xie, Yann LeCun, Yi Ma, Sergey Levine
June, 2024Abstract
Yuexiang Zhai, Hao Bai*, Zipeng Lin*, Jiayi Pan*, Shengbang Tong*, Yifei Zhou*, Alane Suhr, Saining Xie, Yann LeCun, Yi Ma, Sergey Levine. NIPS 2024.
We provide infrastructure and environment for training VLMs with RL on decision-making tasks. We show RL training enables our 7B model to outperform GPT-4V on these tasks. Additionally, we show the intriguing effectiveness of CoT reasoning for performance improvement
Publication
Preprint, Under Review