ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.02156
26
0

Think on your Feet: Adaptive Thinking via Reinforcement Learning for Social Agents

4 May 2025
Minzheng Wang
Y. Li
H. Wang
Xinghua Zhang
Nan Xu
Bingli Wu
Fei Huang
Haiyang Yu
Wenji Mao
    LLMAG
    LRM
ArXivPDFHTML
Abstract

Effective social intelligence simulation requires language agents to dynamically adjust reasoning depth, a capability notably absent in current approaches. While existing methods either lack this kind of reasoning capability or enforce uniform long chain-of-thought reasoning across all scenarios, resulting in excessive token usage and inappropriate social simulation. In this paper, we propose A\textbf{A}Adaptive M\textbf{M}Mode L\textbf{L}Learning (AML\textbf{AML}AML) that strategically selects from four thinking modes (intuitive reaction →\rightarrow→ deep contemplation) based on real-time context. Our framework's core innovation, the A\textbf{A}Adaptive M\textbf{M}Mode P\textbf{P}Policy O\textbf{O}Optimization (AMPO\textbf{AMPO}AMPO) algorithm, introduces three key advancements over existing methods: (1) Multi-granular thinking mode design, (2) Context-aware mode switching across social interaction, and (3) Token-efficient reasoning via depth-adaptive processing. Extensive experiments on social intelligence tasks confirm that AML achieves 15.6% higher task performance than state-of-the-art methods. Notably, our method outperforms GRPO by 7.0% with 32.8% shorter reasoning chains. These results demonstrate that context-sensitive thinking mode selection, as implemented in AMPO, enables more human-like adaptive reasoning than GRPO's fixed-depth approach.

View on arXiv
@article{wang2025_2505.02156,
  title={ Think on your Feet: Adaptive Thinking via Reinforcement Learning for Social Agents },
  author={ Minzheng Wang and Yongbin Li and Haobo Wang and Xinghua Zhang and Nan Xu and Bingli Wu and Fei Huang and Haiyang Yu and Wenji Mao },
  journal={arXiv preprint arXiv:2505.02156},
  year={ 2025 }
}
Comments on this paper