ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.05519
  4. Cited By
NExT-GPT: Any-to-Any Multimodal LLM

NExT-GPT: Any-to-Any Multimodal LLM

11 September 2023
Shengqiong Wu
Hao Fei
Leigang Qu
Wei Ji
Tat-Seng Chua
    MLLM
ArXivPDFHTML

Papers citing "NExT-GPT: Any-to-Any Multimodal LLM"

50 / 336 papers shown
Title
A Survey of Large Language Models for Graphs
A Survey of Large Language Models for Graphs
Xubin Ren
Jiabin Tang
Dawei Yin
Nitesh V. Chawla
Chao Huang
21
30
0
10 May 2024
FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion
FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion
Zehan Wang
Ziang Zhang
Xize Cheng
Rongjie Huang
Luping Liu
...
Haifeng Huang
Yang Zhao
Tao Jin
Peng Gao
Zhou Zhao
18
8
0
08 May 2024
Octopi: Object Property Reasoning with Large Tactile-Language Models
Octopi: Object Property Reasoning with Large Tactile-Language Models
Samson Yu
Kelvin Lin
Anxing Xiao
Jiafei Duan
Harold Soh
LRM
29
21
0
05 May 2024
MANTIS: Interleaved Multi-Image Instruction Tuning
MANTIS: Interleaved Multi-Image Instruction Tuning
Dongfu Jiang
Xuan He
Huaye Zeng
Cong Wei
Max W.F. Ku
Qian Liu
Wenhu Chen
VLM
MLLM
28
32
0
02 May 2024
TheaterGen: Character Management with LLM for Consistent Multi-turn
  Image Generation
TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation
Junhao Cheng
Baiqiao Yin
Kaixin Cai
Minbin Huang
Hanhui Li
...
Yue Li
Yifei Li
Yuhao Cheng
Yiqiang Yan
Xiaodan Liang
DiffM
MLLM
26
12
0
29 Apr 2024
A Survey on Diffusion Models for Time Series and Spatio-Temporal Data
A Survey on Diffusion Models for Time Series and Spatio-Temporal Data
Yiyuan Yang
Ming Jin
Haomin Wen
Chaoli Zhang
Yuxuan Liang
...
Bin Yang
Zenglin Xu
Jiang Bian
Shirui Pan
Qingsong Wen
DiffM
AI4TS
SyDa
26
7
0
29 Apr 2024
WorldGPT: Empowering LLM as Multimodal World Model
WorldGPT: Empowering LLM as Multimodal World Model
Zhiqi Ge
Hongzhe Huang
Mingze Zhou
Juncheng Li
Guoming Wang
Siliang Tang
Yueting Zhuang
29
26
0
28 Apr 2024
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal
  Models with Open-Source Suites
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Zhe Chen
Weiyun Wang
Hao Tian
Shenglong Ye
Zhangwei Gao
...
Tong Lu
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
MLLM
VLM
49
513
0
25 Apr 2024
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with
  Text-Rich Visual Comprehension
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension
Bohao Li
Yuying Ge
Yi Chen
Yixiao Ge
Ruimao Zhang
Ying Shan
VLM
30
39
0
25 Apr 2024
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation
Yuying Ge
Sijie Zhao
Jinguo Zhu
Yixiao Ge
Kun Yi
Lin Song
Chen Li
Xiaohan Ding
Ying Shan
VLM
57
103
0
22 Apr 2024
Retrieval-Augmented Embodied Agents
Retrieval-Augmented Embodied Agents
Yichen Zhu
Zhicai Ou
Xiaofeng Mou
Jian Tang
39
17
0
17 Apr 2024
Fact :Teaching MLLMs with Faithful, Concise and Transferable Rationales
Fact :Teaching MLLMs with Faithful, Concise and Transferable Rationales
Minghe Gao
Shuang Chen
Liang Pang
Yuan Yao
Jisheng Dang
Wenqiao Zhang
Juncheng Li
Siliang Tang
Yueting Zhuang
Tat-Seng Chua
LRM
30
5
0
17 Apr 2024
HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing
HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing
Mude Hui
Siwei Yang
Bingchen Zhao
Yichun Shi
Heng Wang
Peng Wang
Yuyin Zhou
Cihang Xie
25
37
0
15 Apr 2024
In-Context Translation: Towards Unifying Image Recognition, Processing,
  and Generation
In-Context Translation: Towards Unifying Image Recognition, Processing, and Generation
Han Xue
Qianru Sun
Li-Na Song
Wenjun Zhang
Zhiwu Huang
MLLM
36
0
0
15 Apr 2024
A Survey on Multimodal Wearable Sensor-based Human Action Recognition
A Survey on Multimodal Wearable Sensor-based Human Action Recognition
Jianyuan Ni
Hao Tang
Syed Tousiful Haque
Yan Yan
A. Ngu
66
5
0
14 Apr 2024
Facial Affective Behavior Analysis with Instruction Tuning
Facial Affective Behavior Analysis with Instruction Tuning
Yifan Li
Anh Dao
Wentao Bao
Zhen Tan
Tianlong Chen
Huan Liu
Yu Kong
CVBM
42
14
0
07 Apr 2024
JRDB-Social: A Multifaceted Robotic Dataset for Understanding of Context
  and Dynamics of Human Interactions Within Social Groups
JRDB-Social: A Multifaceted Robotic Dataset for Understanding of Context and Dynamics of Human Interactions Within Social Groups
Simindokht Jahangard
Zhixi Cai
Shiki Wen
Hamid Rezatofighi
26
6
0
06 Apr 2024
Empowering Biomedical Discovery with AI Agents
Empowering Biomedical Discovery with AI Agents
Shanghua Gao
Ada Fang
Yepeng Huang
Valentina Giunchiglia
Ayush Noori
Jonathan Richard Schwarz
Yasha Ektefaie
Jovana Kondic
Marinka Zitnik
LLMAG
AI4CE
36
62
0
03 Apr 2024
Plug-and-Play Grounding of Reasoning in Multimodal Large Language Models
Plug-and-Play Grounding of Reasoning in Multimodal Large Language Models
Jiaxing Chen
Yuxuan Liu
Dehu Li
Xiang An
Weimo Deng
Ziyong Feng
Yongle Zhao
Yin Xie
LRM
38
12
0
28 Mar 2024
LITA: Language Instructed Temporal-Localization Assistant
LITA: Language Instructed Temporal-Localization Assistant
De-An Huang
Shijia Liao
Subhashree Radhakrishnan
Hongxu Yin
Pavlo Molchanov
Zhiding Yu
Jan Kautz
VLM
45
49
0
27 Mar 2024
Large Language Models for Education: A Survey and Outlook
Large Language Models for Education: A Survey and Outlook
Shen Wang
Tianlong Xu
Hang Li
Chaoli Zhang
Joleen Liang
Jiliang Tang
Philip S. Yu
Qingsong Wen
AI4Ed
28
84
0
26 Mar 2024
Ultra Low-Cost Two-Stage Multimodal System for Non-Normative Behavior
  Detection
Ultra Low-Cost Two-Stage Multimodal System for Non-Normative Behavior Detection
Albert Lu
Stephen Cranefield
22
0
0
24 Mar 2024
Modeling Unified Semantic Discourse Structure for High-quality Headline
  Generation
Modeling Unified Semantic Discourse Structure for High-quality Headline Generation
Minghui Xu
Hao Fei
Fei Li
Shengqiong Wu
Rui Sun
Chong Teng
Donghong Ji
27
0
0
23 Mar 2024
Improved Baselines for Data-efficient Perceptual Augmentation of LLMs
Improved Baselines for Data-efficient Perceptual Augmentation of LLMs
Théophane Vallaeys
Mustafa Shukor
Matthieu Cord
Jakob Verbeek
52
12
0
20 Mar 2024
HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning
HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning
Fucai Ke
Zhixi Cai
Simindokht Jahangard
Weiqing Wang
P. D. Haghighi
Hamid Rezatofighi
LRM
38
8
0
19 Mar 2024
RelationVLM: Making Large Vision-Language Models Understand Visual
  Relations
RelationVLM: Making Large Vision-Language Models Understand Visual Relations
Zhipeng Huang
Zhizheng Zhang
Zheng-Jun Zha
Yan Lu
Baining Guo
VLM
31
3
0
19 Mar 2024
3D-VLA: A 3D Vision-Language-Action Generative World Model
3D-VLA: A 3D Vision-Language-Action Generative World Model
Haoyu Zhen
Xiaowen Qiu
Peihao Chen
Jincheng Yang
Xin Yan
Yilun Du
Yining Hong
Chuang Gan
LM&Ro
VGen
PINN
34
81
0
14 Mar 2024
DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation
DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation
Minbin Huang
Yanxin Long
Xinchi Deng
Ruihang Chu
Jiangfeng Xiong
Xiaodan Liang
Hong Cheng
Qinglin Lu
Wei Liu
MLLM
EGVM
59
8
0
13 Mar 2024
Let Storytelling Tell Vivid Stories: An Expressive and Fluent Multimodal
  Storyteller
Let Storytelling Tell Vivid Stories: An Expressive and Fluent Multimodal Storyteller
Chuanqi Zang
Jiji Tang
Rongsheng Zhang
Zeng Zhao
Tangjie Lv
Mingtao Pei
Wei Liang
22
3
0
12 Mar 2024
SnapNTell: Enhancing Entity-Centric Visual Question Answering with
  Retrieval Augmented Multimodal LLM
SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM
Jielin Qiu
Andrea Madotto
Zhaojiang Lin
Paul A. Crook
Y. Xu
Xin Luna Dong
Christos Faloutsos
Lei Li
Babak Damavandi
Seungwhan Moon
26
8
0
07 Mar 2024
CLLMs: Consistency Large Language Models
CLLMs: Consistency Large Language Models
Siqi Kou
Lanxiang Hu
Zhe He
Zhijie Deng
Hao Zhang
34
26
0
28 Feb 2024
ProtLLM: An Interleaved Protein-Language LLM with Protein-as-Word
  Pre-Training
ProtLLM: An Interleaved Protein-Language LLM with Protein-as-Word Pre-Training
Le Zhuo
Zewen Chi
Minghao Xu
Heyan Huang
Heqi Zheng
Conghui He
Xian-Ling Mao
Wentao Zhang
41
11
0
28 Feb 2024
ShapeLLM: Universal 3D Object Understanding for Embodied Interaction
ShapeLLM: Universal 3D Object Understanding for Embodied Interaction
Zekun Qi
Runpei Dong
Shaochen Zhang
Haoran Geng
Chunrui Han
Zheng Ge
Li Yi
Kaisheng Ma
33
49
0
27 Feb 2024
PhyGrasp: Generalizing Robotic Grasping with Physics-informed Large
  Multimodal Models
PhyGrasp: Generalizing Robotic Grasping with Physics-informed Large Multimodal Models
Dingkun Guo
Yuqi Xiang
Shuqi Zhao
Xinghao Zhu
Masayoshi Tomizuka
Mingyu Ding
Wei Zhan
18
9
0
26 Feb 2024
Multi-Bit Distortion-Free Watermarking for Large Language Models
Multi-Bit Distortion-Free Watermarking for Large Language Models
Massieh Kordi Boroujeny
Ya Jiang
Kai Zeng
Brian L. Mark
WaLM
VLM
20
4
0
26 Feb 2024
AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D
  Talking Face Generation
AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D Talking Face Generation
Yasheng Sun
Wenqing Chu
Hang Zhou
Kaisiyuan Wang
Hideki Koike
22
5
0
25 Feb 2024
RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis
RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis
Yao Mu
Junting Chen
Qinglong Zhang
Shoufa Chen
Qiaojun Yu
...
Wenhai Wang
Jifeng Dai
Yu Qiao
Mingyu Ding
Ping Luo
37
20
0
25 Feb 2024
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing
  Different Modalities as Different Languages
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
Minsu Kim
Jee-weon Jung
Hyeongseop Rha
Soumi Maiti
Siddhant Arora
Xuankai Chang
Shinji Watanabe
Y. Ro
18
1
0
25 Feb 2024
LLMBind: A Unified Modality-Task Integration Framework
LLMBind: A Unified Modality-Task Integration Framework
Bin Zhu
Munan Ning
Peng Jin
Bin Lin
Jinfa Huang
...
Junwu Zhang
Zhenyu Tang
Mingjun Pan
Xing Zhou
Li-ming Yuan
MLLM
24
6
0
22 Feb 2024
Uncertainty-Aware Evaluation for Vision-Language Models
Uncertainty-Aware Evaluation for Vision-Language Models
Vasily Kostumov
Bulat Nutfullin
Oleg Pilipenko
Eugene Ilyushin
ELM
40
7
0
22 Feb 2024
User-LLM: Efficient LLM Contextualization with User Embeddings
User-LLM: Efficient LLM Contextualization with User Embeddings
Lin Ning
Luyang Liu
Jiaxing Wu
Neo Wu
D. Berlowitz
Sushant Prakash
Bradley Green
S. O’Banion
Jun Xie
37
32
0
21 Feb 2024
The Wolf Within: Covert Injection of Malice into MLLM Societies via an
  MLLM Operative
The Wolf Within: Covert Injection of Malice into MLLM Societies via an MLLM Operative
Zhen Tan
Chengshuai Zhao
Raha Moraffah
Yifan Li
Yu Kong
Tianlong Chen
Huan Liu
36
15
0
20 Feb 2024
Exploring the Frontier of Vision-Language Models: A Survey of Current
  Methodologies and Future Directions
Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions
Akash Ghosh
Arkadeep Acharya
Sriparna Saha
Vinija Jain
Aman Chadha
VLM
41
23
0
20 Feb 2024
A Touch, Vision, and Language Dataset for Multimodal Alignment
A Touch, Vision, and Language Dataset for Multimodal Alignment
Letian Fu
Gaurav Datta
Huang Huang
Will Panitch
Jaimyn Drake
Joseph Ortiz
Mustafa Mukadam
Mike Lambeta
Roberto Calandra
Ken Goldberg
VLM
25
30
0
20 Feb 2024
A Survey on Knowledge Distillation of Large Language Models
A Survey on Knowledge Distillation of Large Language Models
Xiaohan Xu
Ming Li
Chongyang Tao
Tao Shen
Reynold Cheng
Jinyang Li
Can Xu
Dacheng Tao
Tianyi Zhou
KELM
VLM
42
94
0
20 Feb 2024
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
Jun Zhan
Junqi Dai
Jiasheng Ye
Yunhua Zhou
Dong Zhang
...
Jie Fu
Tao Gui
Tianxiang Sun
Yugang Jiang
Xipeng Qiu
MLLM
24
114
0
19 Feb 2024
G-Retriever: Retrieval-Augmented Generation for Textual Graph
  Understanding and Question Answering
G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering
Xiaoxin He
Yijun Tian
Yifei Sun
Nitesh V. Chawla
T. Laurent
Yann LeCun
Xavier Bresson
Bryan Hooi
RALM
104
68
0
12 Feb 2024
Is it safe to cross? Interpretable Risk Assessment with GPT-4V for
  Safety-Aware Street Crossing
Is it safe to cross? Interpretable Risk Assessment with GPT-4V for Safety-Aware Street Crossing
Hochul Hwang
Sunjae Kwon
Yekyung Kim
Donghyun Kim
25
3
0
09 Feb 2024
Large Language Models: A Survey
Large Language Models: A Survey
Shervin Minaee
Tomáš Mikolov
Narjes Nikzad
M. Asgari-Chenaghlu
R. Socher
Xavier Amatriain
Jianfeng Gao
ALM
LM&MA
ELM
112
347
0
09 Feb 2024
It's Never Too Late: Fusing Acoustic Information into Large Language
  Models for Automatic Speech Recognition
It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition
Chen Chen
Ruizhe Li
Yuchen Hu
Sabato Marco Siniscalchi
Pin-Yu Chen
Ensiong Chng
Chao-Han Huck Yang
24
19
0
08 Feb 2024
Previous
1234567
Next