358
v1v2 (latest)

Unveiling Markov Heads in Pretrained Language Models for Offline Reinforcement Learning

Main:8 Pages
10 Figures
Bibliography:3 Pages
8 Tables
Appendix:4 Pages
Abstract

Recently, incorporating knowledge from pretrained language models (PLMs) into decision transformers (DTs) has generated significant attention in offline reinforcement learning (RL). These PLMs perform well in RL tasks, raising an intriguing question: what kind of knowledge from PLMs has been transferred to RL to achieve such good results? This work first dives into this problem by analyzing each head quantitatively and points out Markov head, a crucial component that exists in the attention heads of PLMs. It leads to extreme attention on the last-input token and performs well only in short-term environments. Furthermore, we prove that this extreme attention cannot be changed by re-training embedding layer or fine-tuning. Inspired by our analysis, we propose a general method GPT2-DTMA, which equips a pretrained DT with Mixture of Attention (MoA), to accommodate diverse attention requirements during fine-tuning. Extensive experiments corroborate our theorems and demonstrate the effectiveness of GPT2-DTMA: it achieves comparable performance in short-term environments while significantly narrowing the performance gap in long-term environments.

View on arXiv
Comments on this paper