Hi 👋

I am a first-year PhD student at Berkeley AI Research advised by Prof. Alane Suhr. I have received an Outstanding Paper Award at ACL 2023, and won Amazon Alexa Prize SimBot Challenge in 2023. I currently do language grounding to multi-modality and agents.

Before that, in 2019-2023, I was a happy undergrad at the University of Michigan and Shanghai Jiao Tong University working with Professors Joyce Chai, Dmitry Berenson, and Fan Wu.

I continuously reassess and pursue my ideal lifestyle. Feedback is always welcome :)

Publications & Manuscripts

* denotes equal contribution
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL

Yifei Zhou, Andrea Zanette, Jiayi Pan, Sergey Levine, Aviral Kumar. Preprint 2024.

We present ArCHer, a framework for building multi-turn RL algorithms for training LLM agents. It preserves the flexibility of existing single-turn RL methods for LLMs like PPO, while accommodating multiple turns, long horizons, and delayed rewards effectively.
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
Inversion-Free Image Editing with Natural Language

Sihan Xu*, Yidong Huang*, Jiayi Pan, Ziqiao Ma, Joyce Chai. CVPR 2024.

We present an inversion-free editing (InfEdit) method that allows for consistent natural language guided image editing. Proven through extensive experiments, InfEdit excels in complex editing tasks and is ~10X faster than prior methods,
Inversion-Free Image Editing with Natural Language
Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans?

Yichi Zhang, Jiayi Pan, Yuchen Zhou, Rui Pan, Joyce Chai. EMNLP 2023.

Do Vision-Language Models, an emergent human-computer interface, experience visual illusions similarly to humans, or do they accurately depict reality? We created a new dataset, GVIL, to investigate this. Among other findings, we discovered that larger models tend to be more susceptible to visual illusions, aligning more closely with human perception.
Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans?
World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Language Models

Ziqiao Ma*, Jiayi Pan*, Joyce Chai. ⭐️ ACL 2023 Outstanding Paper.

We present Grounded Open Vocabulary Acquisition (GOVA) for exploring grounding and bootstrapping in open-world language learning. Our visually-grounded language model, OctoBERT, emphasizes grounding as an objective. Tests show OctoBERT outperforms in learning grounded words quickly and robustly, particularly with unseen words.
World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Language Models
SEAGULL: An Embodied Agent for Instruction Following through Situated Dialog

Team SEAGULL at UMich. 🏆 1st Place in the inaugural Alexa Prize SimBot Challenge.

We introduce SEAGULL, an interactive embodied agent designed for Alexa Prize SimBot Challenge, which can complete complex tasks in the Arena simulation environment through dialog with users. SEAGULL is engineered to be efficient, user-centric, and continuously improving.
SEAGULL: An Embodied Agent for Instruction Following through Situated Dialog
Data-Efficient Learning of Natural Language to Linear Temporal Logic Translators for Robot Task Specification

Jiayi Pan, Glen Chou, Dmitry Berenson. ICRA 2023.

We present a learning-based approach to translate from natural language commands to LTL specifications with only a handful of labeled data. It enables few-shot learning of LTL translators while achieving state-of-the-art performance.
Data-Efficient Learning of Natural Language to Linear Temporal Logic Translators for Robot Task Specification
DANLI: Deliberative Agent for Following Natural Language Instructions

Yichi Zhang, Jianing Yang, Jiayi Pan, Shane Storks, Nikhil Devraj, Ziqiao Ma, Keunwoo Peter Yu, Yuwei Bao, Joyce Chai. EMNLP 2022, Oral.

We introduce DANLI, a neuro-symbolic deliberative agent that proactively reasons and plans according to its past experiences. DANLI achieves a 70% improvement on the challenging TEACh benchmark while improving transparency and explainability in its behaviors.
DANLI: Deliberative Agent for Following Natural Language Instructions

Contact

  • Email: jiayipan [AT] berkeley [DOT] edu

    Misc

    • I try to keep notes on what I consume and learn from. You can find them here.
    • I try to develop some habits. Currently, I am learning guitar and music theory.
    • Growing up, I lived in quite a few places: Chongqing, Xinyang, Chengdu, Shanghai, Ann Arbor, and now the Bay Area.