ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.05519
  4. Cited By
NExT-GPT: Any-to-Any Multimodal LLM

NExT-GPT: Any-to-Any Multimodal LLM

11 September 2023
Shengqiong Wu
Hao Fei
Leigang Qu
Wei Ji
Tat-Seng Chua
    MLLM
ArXivPDFHTML

Papers citing "NExT-GPT: Any-to-Any Multimodal LLM"

50 / 336 papers shown
Title
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
Haokun Lin
Teng Wang
Yixiao Ge
Yuying Ge
Zhichao Lu
Ying Wei
Qingfu Zhang
Zhenan Sun
Ying Shan
MLLM
VLM
64
0
0
08 May 2025
ALFEE: Adaptive Large Foundation Model for EEG Representation
ALFEE: Adaptive Large Foundation Model for EEG Representation
Wei Xiong
Junming Lin
Jiangtong Li
Jie Li
Changjun Jiang
18
0
0
07 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
X. Zhang
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
57
0
0
05 May 2025
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models
Guanghao Zhou
Panjia Qiu
C. L. P. Chen
J. Wang
Zheming Yang
Jian Xu
Minghui Qiu
OffRL
LRM
53
0
0
30 Apr 2025
Detecting and Mitigating Hateful Content in Multimodal Memes with Vision-Language Models
Detecting and Mitigating Hateful Content in Multimodal Memes with Vision-Language Models
Minh-Hao Van
Xintao Wu
VLM
79
0
0
30 Apr 2025
Turing Machine Evaluation for Large Language Model
Turing Machine Evaluation for Large Language Model
Haitao Wu
Zongbo Han
Huaxi Huang
Changqing Zhang
ELM
LRM
59
0
0
29 Apr 2025
CMT: A Cascade MAR with Topology Predictor for Multimodal Conditional CAD Generation
CMT: A Cascade MAR with Topology Predictor for Multimodal Conditional CAD Generation
Jianyu Wu
Yizhou Wang
Xiangyu Yue
Xinzhu Ma
J. Guo
Dongzhan Zhou
Wanli Ouyang
Shixiang Tang
66
0
0
29 Apr 2025
Modular Machine Learning: An Indispensable Path towards New-Generation Large Language Models
Modular Machine Learning: An Indispensable Path towards New-Generation Large Language Models
X. Wang
Haoyang Li
Zeyang Zhang
H. Chen
Wenwu Zhu
LRM
77
0
0
28 Apr 2025
VIST-GPT: Ushering in the Era of Visual Storytelling with LLMs?
VIST-GPT: Ushering in the Era of Visual Storytelling with LLMs?
Mohamed Gado
Towhid Taliee
Muhammad Memon
D. Ignatov
Radu Timofte
63
0
0
27 Apr 2025
Token Sequence Compression for Efficient Multimodal Computing
Token Sequence Compression for Efficient Multimodal Computing
Yasmine Omri
Parth Shroff
Thierry Tambe
51
0
0
24 Apr 2025
A Survey of Foundation Model-Powered Recommender Systems: From Feature-Based, Generative to Agentic Paradigms
A Survey of Foundation Model-Powered Recommender Systems: From Feature-Based, Generative to Agentic Paradigms
Chengkai Huang
Hongtao Huang
Tong Yu
Kaige Xie
Junda Wu
Shuai Zhang
Julian McAuley
Dietmar Jannach
Lina Yao
LRM
AI4CE
22
0
0
23 Apr 2025
VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models
VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models
Haojian Huang
Haodong Chen
Shengqiong Wu
Meng Luo
Jinlan Fu
Xinya Du
H. Zhang
Hao Fei
AI4TS
65
0
0
17 Apr 2025
A Survey on Cross-Modal Interaction Between Music and Multimodal Data
A Survey on Cross-Modal Interaction Between Music and Multimodal Data
Sifei Li
Mining Tan
Feier Shen
Minyan Luo
Zijiao Yin
Fan Tang
W. Dong
Changsheng Xu
57
0
0
17 Apr 2025
TAMP: Token-Adaptive Layerwise Pruning in Multimodal Large Language Models
TAMP: Token-Adaptive Layerwise Pruning in Multimodal Large Language Models
Jaewoo Lee
Keyang Xuan
Chanakya Ekbote
Sandeep Polisetty
Yi Ren Fung
Paul Pu Liang
VLM
37
0
0
14 Apr 2025
Multimodal Long Video Modeling Based on Temporal Dynamic Context
Multimodal Long Video Modeling Based on Temporal Dynamic Context
Haoran Hao
Jiaming Han
Yiyuan Zhang
Xiangyu Yue
32
0
0
14 Apr 2025
Transfer between Modalities with MetaQueries
Transfer between Modalities with MetaQueries
Xichen Pan
Satya Narayan Shukla
Aashu Singh
Zhuokai Zhao
Shlok Kumar Mishra
...
Jiuhai Chen
Kunpeng Li
F. Xu
Ji Hou
Saining Xie
DiffM
41
6
0
08 Apr 2025
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning
Xianwei Zhuang
Yuxin Xie
Yufan Deng
Dongchao Yang
Liming Liang
Jinghan Ru
Yuguo Yin
Yuexian Zou
61
1
0
03 Apr 2025
VEGAS: Towards Visually Explainable and Grounded Artificial Social Intelligence
VEGAS: Towards Visually Explainable and Grounded Artificial Social Intelligence
Hao Li
Hao Fei
Zechao Hu
Zhengwei Yang
Zheng Wang
45
0
0
03 Apr 2025
Context-Independent OCR with Multimodal LLMs: Effects of Image Resolution and Visual Complexity
Context-Independent OCR with Multimodal LLMs: Effects of Image Resolution and Visual Complexity
Kotaro Inoue
34
0
0
31 Mar 2025
Aurelia: Test-time Reasoning Distillation in Audio-Visual LLMs
Aurelia: Test-time Reasoning Distillation in Audio-Visual LLMs
Sanjoy Chowdhury
Hanan Gani
Nishit Anand
Sayan Nag
Ruohan Gao
Mohamed Elhoseiny
Salman Khan
Dinesh Manocha
LRM
36
0
0
29 Mar 2025
AutoComPose: Automatic Generation of Pose Transition Descriptions for Composed Pose Retrieval Using Multimodal LLMs
AutoComPose: Automatic Generation of Pose Transition Descriptions for Composed Pose Retrieval Using Multimodal LLMs
Yi-Ting Shen
Sungmin Eum
Doheon Lee
Rohit Shete
Chiao-Yi Wang
H. Kwon
Shuvra S. Bhattacharyya
37
0
0
28 Mar 2025
SocialGen: Modeling Multi-Human Social Interaction with Language Models
SocialGen: Modeling Multi-Human Social Interaction with Language Models
Heng Yu
Juze Zhang
Changan Chen
Tiange Xiang
Yusu Fang
Juan Carlos Niebles
Ehsan Adeli
VGen
47
0
0
28 Mar 2025
Vision-to-Music Generation: A Survey
Vision-to-Music Generation: A Survey
Zhaokai Wang
Chenxi Bao
Le Zhuo
Jingrui Han
Yang Yue
Yihong Tang
Victor Shea-Jay Huang
Yue Liao
EGVM
VGen
74
1
0
27 Mar 2025
FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs
FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs
Xiaoqin Wang
Xusen Ma
Xianxu Hou
Meidan Ding
Yudong Li
Junliang Chen
Wenting Chen
Xiaoyang Peng
LinLin Shen
CVBM
71
0
0
27 Mar 2025
MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning
MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning
Yiwei Ma
Guohai Xu
Xiaoshuai Sun
Jiayi Ji
Jie Lou
Debing Zhang
Rongrong Ji
90
0
0
26 Mar 2025
ACVUBench: Audio-Centric Video Understanding Benchmark
ACVUBench: Audio-Centric Video Understanding Benchmark
Y. Yang
Jimin Zhuang
Guangzhi Sun
Changli Tang
Y. Li
P. Li
Yifan Jiang
W. Li
Z. Ma
Chao Zhang
AuLLM
CoGe
53
0
0
25 Mar 2025
UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation
UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation
Qihui Zhang
Munan Ning
Zheyuan Liu
Yanbo Wang
Jiayi Ye
Yue Huang
Shuo Yang
Xiao Chen
Y. Song
Li Yuan
LRM
56
0
0
19 Mar 2025
Continual Multimodal Contrastive Learning
Continual Multimodal Contrastive Learning
Xiaohao Liu
Xiaobo Xia
See-Kiong Ng
Tat-Seng Chua
CLL
54
0
0
19 Mar 2025
Universal Scene Graph Generation
Universal Scene Graph Generation
Shengqiong Wu
Hao Fei
Tat-Seng Chua
36
0
0
19 Mar 2025
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
Jingyi Zhang
Jiaxing Huang
H. Yao
Shunyu Liu
Xikun Zhang
Shijian Lu
Dacheng Tao
LRM
52
17
0
17 Mar 2025
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Y. Wang
Shengqiong Wu
Y. Zhang
William Yang Wang
Ziwei Liu
Jiebo Luo
Hao Fei
LRM
80
7
0
16 Mar 2025
Multi-Granular Multimodal Clue Fusion for Meme Understanding
Multi-Granular Multimodal Clue Fusion for Meme Understanding
Li Zheng
Hao Fei
Ting Dai
Zuquan Peng
Fei Li
Huisheng Ma
Chong Teng
Donghong Ji
50
0
0
16 Mar 2025
Cross-Modal Learning for Music-to-Music-Video Description Generation
Zhuoyuan Mao
Mengjie Zhao
Qiyu Wu
Zhi-Wei Zhong
Wei-Hsiang Liao
Hiromi Wakaki
Yuki Mitsufuji
DiffM
VGen
73
0
0
14 Mar 2025
AudioX: Diffusion Transformer for Anything-to-Audio Generation
AudioX: Diffusion Transformer for Anything-to-Audio Generation
Zeyue Tian
Yizhu Jin
Zhaoyang Liu
Ruibin Yuan
Xu Tan
Qifeng Chen
Wei Xue
Y. Guo
65
3
0
13 Mar 2025
TA-V2A: Textually Assisted Video-to-Audio Generation
Yuhuan You
Xihong Wu
T. Qu
DiffM
45
0
0
12 Mar 2025
Learning to Match Unpaired Data with Minimum Entropy Coupling
Mustapha Bounoua
Giulio Franzese
Pietro Michiardi
31
0
0
11 Mar 2025
Unleashing the Potential of Large Language Models for Text-to-Image Generation through Autoregressive Representation Alignment
Unleashing the Potential of Large Language Models for Text-to-Image Generation through Autoregressive Representation Alignment
Xing Xie
Jiawei Liu
Ziyue Lin
Huijie Fan
Zhi-Long Han
Yandong Tang
Liangqiong Qu
40
0
0
10 Mar 2025
LLaVA-RadZ: Can Multimodal Large Language Models Effectively Tackle Zero-shot Radiology Recognition?
Bangyan Li
Wenxuan Huang
Yunhang Shen
Y. Wang
Shaohui Lin
...
Ling You
Yinqi Zhang
Ke Li
Xing Sun
Y. Sun
47
1
0
10 Mar 2025
SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation
SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation
Z. Chen
Chunwei Wang
Xiuwei Chen
Hang Xu
J. Han
Xiandan Liang
VLM
67
1
0
09 Mar 2025
ToFu: Visual Tokens Reduction via Fusion for Multi-modal, Multi-patch, Multi-image Task
Vittorio Pippi
Matthieu Guillaumin
S. Cascianelli
Rita Cucchiara
M. Jaritz
Loris Bazzani
62
0
0
06 Mar 2025
DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles
Rui Zhao
Weijia Mao
Mike Zheng Shou
64
0
0
05 Mar 2025
Audio-Reasoner: Improving Reasoning Capability in Large Audio Language Models
Zhifei Xie
Mingbao Lin
Z. Liu
Pengcheng Wu
Shuicheng Yan
Chunyan Miao
AuLLM
OffRL
LRM
76
5
0
04 Mar 2025
Watch Out Your Album! On the Inadvertent Privacy Memorization in Multi-Modal Large Language Models
Tianjie Ju
Yi Hua
Hao Fei
Zhenyu Shao
Yubin Zheng
Haodong Zhao
M. Lee
W. Hsu
Zhuosheng Zhang
Gongshen Liu
43
0
0
03 Mar 2025
MINT: Multi-modal Chain of Thought in Unified Generative Models for Enhanced Image Generation
Yi Wang
Mushui Liu
Wanggui He
Longxiang Zhang
Z. Huang
...
H. Li
Weilong Dai
Mingli Song
Jie Song
Hao Jiang
MLLM
MoE
LRM
72
1
0
03 Mar 2025
GPIoT: Tailoring Small Language Models for IoT Program Synthesis and Development
Leming Shen
Qiang Yang
Xinyu Huang
Zijing Ma
Yuanqing Zheng
27
1
0
02 Mar 2025
How Deep is Love in LLMs' Hearts? Exploring Semantic Size in Human-like Cognition
Yao Yao
Yifei Yang
Xinbei Ma
Dongjie Yang
Zhuosheng Zhang
Zuchao Li
Hai Zhao
34
0
0
01 Mar 2025
Language Model Mapping in Multimodal Music Learning: A Grand Challenge Proposal
Daniel Y. Chin
Gus Xia
34
0
0
01 Mar 2025
CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering
Tianyu Huai
Jie Zhou
Xingjiao Wu
Qin Chen
Qingchun Bai
Ze Zhou
Liang He
MoE
30
0
0
01 Mar 2025
HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models
HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models
Xiao Wang
Jingyun Hua
Weihong Lin
Y. Zhang
Fuzheng Zhang
Jianlong Wu
Di Zhang
Liqiang Nie
VLM
80
0
0
28 Feb 2025
Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think
Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think
L. Chen
S. Bai
Wenhao Chai
Weichu Xie
Haozhe Zhao
Leon Vinci
Junyang Lin
Baobao Chang
DiffM
82
4
0
27 Feb 2025
1234567
Next