ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
Yifei Zhou, Andrea Zanette, Jiayi Pan, Sergey Levine, Aviral Kumar
January, 2024Abstract
Yifei Zhou, Andrea Zanette, Jiayi Pan, Sergey Levine, Aviral Kumar. ICML 2024.
We present ArCHer, a new framework of multi-turn RL algorithms for training LM agents. It preserves the flexibility of mainstream single-turn LM RL methods like PPO, while effectively handling multiple turns, long horizons, and delayed rewards.