46
v1v2v3 (latest)

Dynamic Adversarial Reinforcement Learning for Robust Multimodal Large Language Models

Yicheng Bao
Xuhong Wang
Qiaosheng Zhang
Chaochao Lu
Xia Hu
Xin Tan
Main:8 Pages
13 Figures
Bibliography:3 Pages
11 Tables
Appendix:10 Pages
Abstract

Despite their impressive capabilities, Multimodal Large Language Models (MLLMs) exhibit perceptual fragility when confronted with visually complex scenes. This weakness stems from a reliance on finite training datasets, which are prohibitively expensive to scale and impose a ceiling on model robustness. We introduce \textbf{AOT-SFT}, a large-scale adversarial dataset for bootstrapping MLLM robustness. Building on this, we propose \textbf{AOT (Adversarial Opponent Training)}, a self-play framework that forges MLLM robustness by creating its own training data. Our method orchestrates a co-evolution between an image-editing Attacker and a Defender MLLM, where the Attacker generates a diverse and dynamic curriculum of image manipulations, forcing the Defender to adapt and improve. Extensive experiments demonstrate that AOT enhances the Defender's perceptual robustness and reduces hallucinations, establishing a scalable paradigm for training more reliable MLLMs.

View on arXiv
Comments on this paper