ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.05519
  4. Cited By
NExT-GPT: Any-to-Any Multimodal LLM
v1v2v3 (latest)

NExT-GPT: Any-to-Any Multimodal LLM

International Conference on Machine Learning (ICML), 2023
11 September 2023
Shengqiong Wu
Hao Fei
Leigang Qu
Wei Ji
Tat-Seng Chua
    MLLM
ArXiv (abs)PDFHTMLHuggingFace (78 upvotes)

Papers citing "NExT-GPT: Any-to-Any Multimodal LLM"

50 / 240 papers shown
Title
VER-Bench: Evaluating MLLMs on Reasoning with Fine-Grained Visual Evidence
VER-Bench: Evaluating MLLMs on Reasoning with Fine-Grained Visual Evidence
Chenhui Qiang
Zhaoyang Wei
Xumeng Han Zipeng Wang
Zipeng Wang
Siyao Li
Xiangyuan Lan
Jianbin Jiao
Zhenjun Han
LRM
60
2
0
06 Aug 2025
Multi-TW: Benchmarking Multimodal Models on Traditional Chinese Question Answering in Taiwan
Multi-TW: Benchmarking Multimodal Models on Traditional Chinese Question Answering in Taiwan
Jui-Ming Yao
Bing-Cheng Xie
Sheng-Wei Peng
Hao-Yuan Chen
He-Rong Zheng
Bing-Jia Tan
Peter Shaojui Wang
Shun-Feng Su
56
0
0
02 Aug 2025
X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again
X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again
Zigang Geng
Y. Wang
Yeyao Ma
Chen Li
Yongming Rao
...
Han Hu
Xiaosong Zhang
Linus
Di Wang
Jie Jiang
146
24
0
29 Jul 2025
Advancing Compositional LLM Reasoning with Structured Task Relations in Interactive Multimodal Communications
Advancing Compositional LLM Reasoning with Structured Task Relations in Interactive Multimodal CommunicationsIEEE Journal on Selected Areas in Communications (JSAC), 2025
Xinye Cao
Hongcan Guo
Guoshun Nan
Jiaoyang Cui
Haoting Qian
...
Diyang Zhang
Yanzhao Hou
Huici Wu
Xiaofeng Tao
Tony Q. S. Quek
118
0
0
28 Jul 2025
MLLM-based Speech Recognition: When and How is Multimodality Beneficial?
MLLM-based Speech Recognition: When and How is Multimodality Beneficial?
Yiwen Guan
V. Trinh
Vivek Voleti
Jacob Whitehill
183
1
0
25 Jul 2025
UGG-ReID: Uncertainty-Guided Graph Model for Multi-Modal Object Re-Identification
UGG-ReID: Uncertainty-Guided Graph Model for Multi-Modal Object Re-Identification
Xixi Wan
Aihua Zheng
Bo Jiang
Beibei Wang
Chenglong Li
Jin Tang
58
0
0
07 Jul 2025
MotionGPT3: Human Motion as a Second Modality
MotionGPT3: Human Motion as a Second Modality
Bingfan Zhu
Biao Jiang
S. Wang
Bin Wang
Tao Chen
Linjie Luo
Youyi Zheng
Xin Chen
235
2
0
30 Jun 2025
XTransfer: Modality-Agnostic Few-Shot Model Transfer for Human Sensing at the Edge
XTransfer: Modality-Agnostic Few-Shot Model Transfer for Human Sensing at the Edge
Yu Zhang
Xi Zhang
Hualin zhou
Xinyuan Chen
Shang Gao
Hong Jia
Jianfei Yang
Yuankai Qi
Tao Gu
162
0
0
28 Jun 2025
ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing
ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing
Huadai Liu
Kaicheng Luo
Jialei Wang
Wen Wang
Qian Chen
Zhou Zhao
Wei Xue
VGenLRM
289
13
0
26 Jun 2025
ReactEMG: Zero-Shot, Low-Latency Intent Detection via sEMG
ReactEMG: Zero-Shot, Low-Latency Intent Detection via sEMG
Runsheng Wang
Xinyue Zhu
Ava Chen
Jingxi Xu
Lauren Winterbottom
Dawn M. Nilsen
J. Stein
M. Ciocarlie
135
0
0
24 Jun 2025
UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation
UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation
Teng Li
Quanfeng Lu
Lirui Zhao
Hao Li
X. Zhu
Yu Qiao
Jun Zhang
Wenqi Shao
176
4
0
20 Jun 2025
From LLM-anation to LLM-orchestrator: Coordinating Small Models for Data Labeling
From LLM-anation to LLM-orchestrator: Coordinating Small Models for Data Labeling
Yao Lu
Zhaiyuan Ji
Jiawei Du
Yu Shanqing
Qi Xuan
Tianyi Zhou
172
6
0
19 Jun 2025
Show-o2: Improved Native Unified Multimodal Models
Show-o2: Improved Native Unified Multimodal Models
Jinheng Xie
Zhenheng Yang
Mike Zheng Shou
VGen
379
73
0
18 Jun 2025
Enhancing Hyperbole and Metaphor Detection with Their Bidirectional Dynamic Interaction and Emotion Knowledge
Enhancing Hyperbole and Metaphor Detection with Their Bidirectional Dynamic Interaction and Emotion KnowledgeAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Li Zheng
Sihang Wang
Hao Fei
Zuquan Peng
Fei Li
Jianming Fu
Chong Teng
Donghong Ji
154
2
0
18 Jun 2025
Pisces: An Auto-regressive Foundation Model for Image Understanding and Generation
Pisces: An Auto-regressive Foundation Model for Image Understanding and Generation
Zhiyang Xu
Jiuhai Chen
Zhaojiang Lin
Xichen Pan
Lifu Huang
...
Di Jin
Michihiro Yasunaga
Lili Yu
Xi Lin
Shaoliang Nie
286
4
0
12 Jun 2025
DanceChat: Large Language Model-Guided Music-to-Dance Generation
DanceChat: Large Language Model-Guided Music-to-Dance Generation
Qing Wang
Xiaohang Yang
Yilan Dong
Naveen Raj Govindaraj
Gregory Slabaugh
Shanxin Yuan
214
0
0
12 Jun 2025
Revolutionizing Clinical Trials: A Manifesto for AI-Driven Transformation
Revolutionizing Clinical Trials: A Manifesto for AI-Driven Transformation
M. Schaar
Richard W. Peck
E. McKinney
Jim Weatherall
Stuart Bailey
...
Rafik Salama
Christina Gunther
Francesca Frau
Antoine Pugeat
Ramon Hernandez
MedIm
208
8
0
10 Jun 2025
Multimodal Representation Alignment for Cross-modal Information Retrieval
Fan Xu
Luis A. Leiva
167
1
0
10 Jun 2025
What Limits Virtual Agent Application? OmniBench: A Scalable Multi-Dimensional Benchmark for Essential Virtual Agent Capabilities
Wendong Bu
Yang Wu
Qifan Yu
Minghe Gao
Bingchen Miao
...
Mengze Li
Wei Ji
Juncheng Billy Li
Siliang Tang
Yueting Zhuang
ELM
137
1
0
10 Jun 2025
Uncertainty-o: One Model-agnostic Framework for Unveiling Uncertainty in Large Multimodal Models
Uncertainty-o: One Model-agnostic Framework for Unveiling Uncertainty in Large Multimodal Models
Ruiyang Zhang
Hu Zhang
Hao Fei
Zhedong Zheng
UQCV
198
0
0
09 Jun 2025
SUDER: Self-Improving Unified Large Multimodal Models for Understanding and Generation with Dual Self-Rewards
SUDER: Self-Improving Unified Large Multimodal Models for Understanding and Generation with Dual Self-Rewards
Jixiang Hong
Yiran Zhang
Guanzhong Wang
Yi Liu
Ji-Rong Wen
Rui Yan
LRM
170
1
0
09 Jun 2025
EgoM2P: Egocentric Multimodal Multitask Pretraining
EgoM2P: Egocentric Multimodal Multitask Pretraining
Gen Li
Yutong Chen
Yiqian Wu
Kaifeng Zhao
Marc Pollefeys
Siyu Tang
EgoVVLM
299
3
0
09 Jun 2025
UniCUE: Unified Recognition and Generation Framework for Chinese Cued Speech Video-to-Speech Generation
UniCUE: Unified Recognition and Generation Framework for Chinese Cued Speech Video-to-Speech Generation
Jinting Wang
Shan Yang
Chenxing Li
Dong Yu
Li Liu
263
0
0
04 Jun 2025
Resolving Task Objective Conflicts in Unified Model via Task-Aware Mixture-of-Experts
Resolving Task Objective Conflicts in Unified Model via Task-Aware Mixture-of-Experts
Jiaxing Zhang
Xinyi Zeng
205
0
0
04 Jun 2025
How Far Are We from Generating Missing Modalities with Foundation Models?
How Far Are We from Generating Missing Modalities with Foundation Models?
Guanzhou Ke
Yi Xie
Xiaoli Wang
Guoqing Chao
Bo Wang
VLM
230
0
0
04 Jun 2025
HaploOmni: Unified Single Transformer for Multimodal Video Understanding and Generation
HaploOmni: Unified Single Transformer for Multimodal Video Understanding and Generation
Yicheng Xiao
Lin Song
Rui Yang
Cheng Cheng
Zunnan Xu
Zhaoyang Zhang
Yixiao Ge
Xiu Li
Mingyu Ding
180
5
0
03 Jun 2025
FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities
FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities
Jin Wang
Yao Lai
Aoxue Li
Shifeng Zhang
Jiacheng Sun
Ning Kang
Chengyue Wu
Zhenguo Li
Ping Luo
278
16
0
26 May 2025
SATORI-R1: Incentivizing Multimodal Reasoning through Explicit Visual Anchoring
SATORI-R1: Incentivizing Multimodal Reasoning through Explicit Visual Anchoring
Chuming Shen
Wei Wei
Xiaoye Qu
Yu Cheng
LRM
326
8
0
25 May 2025
Sensorimotor features of self-awareness in multimodal large language models
Sensorimotor features of self-awareness in multimodal large language models
Iñaki Dellibarda Varela
Pablo Romero-Sorozabal
Diego Torricelli
Gabriel Delgado-Oleas
Jose Ignacio Serrano
Maria Dolores del Castillo Sobrino
Eduardo Rocon
Manuel Cebrian
94
0
0
25 May 2025
Multi-Modality Expansion and Retention for LLMs through Parameter Merging and Decoupling
Multi-Modality Expansion and Retention for LLMs through Parameter Merging and DecouplingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Junlin Li
Guodong DU
Jing Li
Sim Kuan Goh
Wenya Wang
...
Fangming Liu
Jing Li
Saleh Alharbi
Daojing He
Min Zhang
MoMeCLL
286
1
0
21 May 2025
MMaDA: Multimodal Large Diffusion Language Models
MMaDA: Multimodal Large Diffusion Language Models
Ling Yang
Ye Tian
Bowen Li
Xinchen Zhang
Ke Shen
Yunhai Tong
Mengdi Wang
VLMLRM
429
98
0
21 May 2025
AudioJailbreak: Jailbreak Attacks against End-to-End Large Audio-Language Models
AudioJailbreak: Jailbreak Attacks against End-to-End Large Audio-Language Models
Guangke Chen
Fu Song
Zhe Zhao
Xiaojun Jia
Yang Liu
Yanchen Qiao
Weizhe Zhang
AuLLMAAML
353
3
0
20 May 2025
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
Yang Liu
Ming Ma
Xiaomin Yu
Pengxiang Ding
Han Zhao
Mingyang Sun
Siteng Huang
Xuetao Zhang
LRM
434
19
0
18 May 2025
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
Haokun Lin
Teng Wang
Yixiao Ge
Yuying Ge
Zhichao Lu
Ying Wei
Gang Qu
Zhenan Sun
Mingyu Ding
MLLMVLM
366
30
0
08 May 2025
ALFEE: Adaptive Large Foundation Model for EEG Representation
ALFEE: Adaptive Large Foundation Model for EEG Representation
Wei Xiong
Junming Lin
Jiangtong Li
Jie Li
Changjun Jiang
230
2
0
07 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Wei Wei
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
...
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
967
25
0
05 May 2025
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models
Guanghao Zhou
Panjia Qiu
Chong Chen
Jiadong Wang
Zheming Yang
Jian Xu
Minghui Qiu
OffRLLRM
441
18
0
30 Apr 2025
Detecting and Mitigating Hateful Content in Multimodal Memes with Vision-Language Models
Detecting and Mitigating Hateful Content in Multimodal Memes with Vision-Language Models
Minh-Hao Van
Xintao Wu
VLM
307
0
0
30 Apr 2025
CMT: A Cascade MAR with Topology Predictor for Multimodal Conditional CAD Generation
CMT: A Cascade MAR with Topology Predictor for Multimodal Conditional CAD Generation
Jianyu Wu
Yizhou Wang
Xiangyu Yue
Cheng Wang
Jinpei Guo
Dongzhan Zhou
Wanli Ouyang
Bin Wang
284
2
0
29 Apr 2025
Modular Machine Learning: An Indispensable Path towards New-Generation Large Language Models
Modular Machine Learning: An Indispensable Path towards New-Generation Large Language Models
X. Wang
Haoyang Li
Zeyang Zhang
Zeyang Zhang
Wenwu Zhu
LRM
305
1
0
28 Apr 2025
VIST-GPT: Ushering in the Era of Visual Storytelling with LLMs?
VIST-GPT: Ushering in the Era of Visual Storytelling with LLMs?
Mohamed Gado
Towhid Taliee
Muhammad Memon
D. Ignatov
Radu Timofte
378
3
0
27 Apr 2025
TAMP: Token-Adaptive Layerwise Pruning in Multimodal Large Language Models
TAMP: Token-Adaptive Layerwise Pruning in Multimodal Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Jaewoo Lee
Keyang Xuan
Chanakya Ekbote
Sandeep Polisetty
Yi R. Fung
Paul Pu Liang
VLM
288
2
0
14 Apr 2025
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning
Xianwei Zhuang
Yuxin Xie
Yufan Deng
Dongchao Yang
Liming Liang
Jinghan Ru
Yuguo Yin
Yuexian Zou
289
14
0
03 Apr 2025
Aurelia: Test-time Reasoning Distillation in Audio-Visual LLMs
Aurelia: Test-time Reasoning Distillation in Audio-Visual LLMs
Sanjoy Chowdhury
Hanan Gani
Nishit Anand
Sayan Nag
Ruohan Gao
Mohamed Elhoseiny
Salman Khan
Dinesh Manocha
LRM
368
4
0
29 Mar 2025
AutoComPose: Automatic Generation of Pose Transition Descriptions for Composed Pose Retrieval Using Multimodal LLMs
AutoComPose: Automatic Generation of Pose Transition Descriptions for Composed Pose Retrieval Using Multimodal LLMs
Yi-Ting Shen
Sungmin Eum
Doheon Lee
Rohit Shete
Chiao-Yi Wang
H. Kwon
Shuvra S. Bhattacharyya
317
0
0
28 Mar 2025
MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning
MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning
Yiwei Ma
Guohai Xu
Xiaoshuai Sun
Jinfa Huang
Jie Lou
Debing Zhang
Rongrong Ji
407
6
0
26 Mar 2025
Audio-centric Video Understanding Benchmark without Text Shortcut
Audio-centric Video Understanding Benchmark without Text Shortcut
Yue Yang
Jimin Zhuang
Guangzhi Sun
Changli Tang
Yongqian Li
P. Li
Yifan Jiang
W. Li
Tianhao Shen
Chao Zhang
AuLLMCoGe
338
0
0
25 Mar 2025
Boosting Virtual Agent Learning and Reasoning: A Step-Wise, Multi-Dimensional, and Generalist Reward Model with Benchmark
Boosting Virtual Agent Learning and Reasoning: A Step-Wise, Multi-Dimensional, and Generalist Reward Model with Benchmark
Bingchen Miao
Y. Wu
Minghe Gao
Qifan Yu
Wendong Bu
Wenqiao Zhang
Yunfei Li
Siliang Tang
Tat-Seng Chua
Juncheng Billy Li
LLMAGLRM
312
3
0
24 Mar 2025
Continual Multimodal Contrastive Learning
Continual Multimodal Contrastive Learning
Xiaohao Liu
Xiaobo Xia
See-Kiong Ng
Tat-Seng Chua
CLL
571
8
0
19 Mar 2025
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
Jingyi Zhang
Jiaxing Huang
Huanjin Yao
Shunyu Liu
Xikun Zhang
Shijian Lu
Dacheng Tao
LRM
305
186
0
17 Mar 2025
Previous
12345
Next