ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.05519
  4. Cited By
NExT-GPT: Any-to-Any Multimodal LLM

NExT-GPT: Any-to-Any Multimodal LLM

11 September 2023
Shengqiong Wu
Hao Fei
Leigang Qu
Wei Ji
Tat-Seng Chua
    MLLM
ArXivPDFHTML

Papers citing "NExT-GPT: Any-to-Any Multimodal LLM"

50 / 336 papers shown
Title
Chain-of-Description: What I can understand, I can put into words
Chain-of-Description: What I can understand, I can put into words
J. Guo
Daimeng Wei
Z. Li
Hengchao Shang
Yuanchang Luo
Hao Yang
45
0
0
22 Feb 2025
SAE-V: Interpreting Multimodal Models for Enhanced Alignment
SAE-V: Interpreting Multimodal Models for Enhanced Alignment
Hantao Lou
Changye Li
Jiaming Ji
Yaodong Yang
38
0
0
22 Feb 2025
HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation
HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation
L. Yang
Xinchen Zhang
Ye Tian
Chenming Shang
Minghao Xu
Wentao Zhang
Bin Cui
88
1
0
17 Feb 2025
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models
Zhenxing Mi
Kuan-Chieh Jackson Wang
Guocheng Qian
Hanrong Ye
Runtao Liu
Sergey Tulyakov
Kfir Aberman
Dan Xu
LRM
42
0
0
12 Feb 2025
UniMoD: Efficient Unified Multimodal Transformers with Mixture-of-Depths
Weijia Mao
Z. Yang
Mike Zheng Shou
MoE
63
0
0
10 Feb 2025
Parameter-Efficient Fine-Tuning for Foundation Models
Parameter-Efficient Fine-Tuning for Foundation Models
Dan Zhang
Tao Feng
Lilong Xue
Yuandong Wang
Yuxiao Dong
J. Tang
37
6
0
23 Jan 2025
VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model
VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model
Xianwei Zhuang
Yuxin Xie
Yufan Deng
Liming Liang
Jinghan Ru
Yuguo Yin
Yuexian Zou
MLLM
VLM
LRM
100
4
0
21 Jan 2025
LASER: Lip Landmark Assisted Speaker Detection for Robustness
LASER: Lip Landmark Assisted Speaker Detection for Robustness
Le Thien Phuc Nguyen
Z. Yu
Yong Jae Lee
29
1
0
21 Jan 2025
Towards Advancing Code Generation with Large Language Models: A Research Roadmap
Towards Advancing Code Generation with Large Language Models: A Research Roadmap
Haolin Jin
Huaming Chen
Qinghua Lu
Liming Zhu
LLMAG
36
1
0
20 Jan 2025
A Comprehensive Survey of Foundation Models in Medicine
A Comprehensive Survey of Foundation Models in Medicine
Wasif Khan
Seowung Leem
Kyle B. See
Joshua K. Wong
Shaoting Zhang
R. Fang
AI4CE
LM&MA
VLM
95
16
0
17 Jan 2025
Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition
Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition
Hao Fei
Shengqiong Wu
Wei Ji
H. Zhang
M. Zhang
M. Lee
W. Hsu
LRM
VGen
44
55
0
08 Jan 2025
Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
Hao Fei
Shengqiong Wu
H. Zhang
Tat-Seng Chua
Shuicheng Yan
56
35
0
31 Dec 2024
AV-EmoDialog: Chat with Audio-Visual Users Leveraging Emotional Cues
AV-EmoDialog: Chat with Audio-Visual Users Leveraging Emotional Cues
Se Jin Park
Yeonju Kim
Hyeongseop Rha
Bella Godiva
Y. Ro
36
1
0
23 Dec 2024
CoF: Coarse to Fine-Grained Image Understanding for Multi-modal Large
  Language Models
CoF: Coarse to Fine-Grained Image Understanding for Multi-modal Large Language Models
Yeyuan Wang
D. Gao
Bin Li
Rujiao Long
Lei Yi
Xiaoyan Cai
Libin Yang
Jinxia Zhang
Shanqing Yu
Qi Xuan
68
1
0
22 Dec 2024
Deploying Foundation Model Powered Agent Services: A Survey
Deploying Foundation Model Powered Agent Services: A Survey
Wenchao Xu
Jinyu Chen
Peirong Zheng
Xiaoquan Yi
Tianyi Tian
...
Quan Wan
Haozhao Wang
Yunfeng Fan
Qinliang Su
Xuemin Shen
AI4CE
112
1
0
18 Dec 2024
Do Language Models Understand Time?
Do Language Models Understand Time?
Xi Ding
Lei Wang
160
0
0
18 Dec 2024
CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models
CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models
Zihui Cheng
Qiguang Chen
Jin Zhang
Hao Fei
Xiaocheng Feng
Wanxiang Che
Min Li
L. Qin
VLM
MLLM
LRM
75
3
0
17 Dec 2024
IDEA-Bench: How Far are Generative Models from Professional Designing?
IDEA-Bench: How Far are Generative Models from Professional Designing?
C. Liang
Lianghua Huang
Jingwu Fang
Huanzhang Dou
Wei Wang
Zhi-Fan Wu
Yupeng Shi
Junge Zhang
Xin Zhao
Yu Liu
3DV
77
1
0
16 Dec 2024
Combating Multimodal LLM Hallucination via Bottom-Up Holistic Reasoning
Combating Multimodal LLM Hallucination via Bottom-Up Holistic Reasoning
Shengqiong Wu
Hao Fei
Liangming Pan
William Yang Wang
Shuicheng Yan
Tat-Seng Chua
LRM
59
1
0
15 Dec 2024
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Zhisheng Zhong
Chengyao Wang
Yuqi Liu
Senqiao Yang
Longxiang Tang
...
Shaozuo Yu
Sitong Wu
Eric Lo
Shu-Lin Liu
Jiaya Jia
AuLLM
100
6
0
12 Dec 2024
Olympus: A Universal Task Router for Computer Vision Tasks
Olympus: A Universal Task Router for Computer Vision Tasks
Yuanze Lin
Yunsheng Li
Dongdong Chen
Weijian Xu
Ronald Clark
Philip H. S. Torr
VLM
ObjD
112
0
0
12 Dec 2024
Effective Text Adaptation for LLM-based ASR through Soft Prompt
  Fine-Tuning
Effective Text Adaptation for LLM-based ASR through Soft Prompt Fine-Tuning
Yingyi Ma
Zhe Liu
Ozlem Kalinli
65
0
0
09 Dec 2024
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand
  Audio-Visual Information?
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?
Kaixiong Gong
Kaituo Feng
B. Li
Yibing Wang
Mofan Cheng
...
Jiaming Han
Benyou Wang
Yutong Bai
Z. Yang
Xiangyu Yue
MLLM
AuLLM
VLM
79
5
0
03 Dec 2024
AgriBench: A Hierarchical Agriculture Benchmark for Multimodal Large
  Language Models
AgriBench: A Hierarchical Agriculture Benchmark for Multimodal Large Language Models
Yutong Zhou
Masahiro Ryo
64
0
0
30 Nov 2024
DuetML: Human-LLM Collaborative Machine Learning Framework for
  Non-Expert Users
DuetML: Human-LLM Collaborative Machine Learning Framework for Non-Expert Users
Wataru Kawabe
Yusuke Sugano
HAI
61
0
0
28 Nov 2024
Orthus: Autoregressive Interleaved Image-Text Generation with Modality-Specific Heads
Siqi Kou
Jiachun Jin
Chang Liu
Ye Ma
Jian Jia
Quan Chen
Peng Jiang
Zhijie Deng
Zhijie Deng
DiffM
VGen
VLM
107
5
0
28 Nov 2024
MotionLLaMA: A Unified Framework for Motion Synthesis and Comprehension
MotionLLaMA: A Unified Framework for Motion Synthesis and Comprehension
Zeyu Ling
Bo Han
Shiyang Li
H. Shen
Jikang Cheng
Changqing Zou
79
1
0
26 Nov 2024
Tiny-Align: Bridging Automatic Speech Recognition and Large Language
  Model on the Edge
Tiny-Align: Bridging Automatic Speech Recognition and Large Language Model on the Edge
Ruiyang Qin
Dancheng Liu
Gelei Xu
Zheyu Yan
Chenhui Xu
Yuting Hu
Xiaolin Hu
Jinjun Xiong
Yiyu Shi
AuLLM
101
1
0
21 Nov 2024
Jailbreak Attacks and Defenses against Multimodal Generative Models: A
  Survey
Jailbreak Attacks and Defenses against Multimodal Generative Models: A Survey
Xuannan Liu
Xing Cui
Peipei Li
Zekun Li
Huaibo Huang
Shuhan Xia
Miaoxuan Zhang
Yueying Zou
Ran He
AAML
58
6
0
14 Nov 2024
Spider: Any-to-Many Multimodal LLM
Spider: Any-to-Many Multimodal LLM
Jinxiang Lai
Jie Zhang
Jun Liu
Jian Li
Xiaocheng Lu
Song Guo
MLLM
52
2
0
14 Nov 2024
Autoregressive Models in Vision: A Survey
Autoregressive Models in Vision: A Survey
Jing Xiong
Gongye Liu
Lun Huang
Chengyue Wu
Taiqiang Wu
...
M. Zhang
Guillermo Sapiro
Jiebo Luo
Ping Luo
Ngai Wong
VGen
46
9
0
08 Nov 2024
CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM
CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM
Jingwei Xu
Chenyu Wang
Zibo Zhao
Wen Liu
Yi-An Ma
Shenghua Gao
50
11
0
07 Nov 2024
Exploring the Interplay Between Video Generation and World Models in
  Autonomous Driving: A Survey
Exploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey
Ao Fu
Yi Zhou
Tao Zhou
Y. Yang
Bojun Gao
Qun Li
Guobin Wu
Ling Shao
VGen
56
2
0
05 Nov 2024
Generative Emotion Cause Explanation in Multimodal Conversations
Generative Emotion Cause Explanation in Multimodal Conversations
Lin Wang
Xiaocui Yang
Shi Feng
Daling Wang
Yifei Zhang
23
0
0
01 Nov 2024
On Learning Multi-Modal Forgery Representation for Diffusion Generated Video Detection
On Learning Multi-Modal Forgery Representation for Diffusion Generated Video Detection
Xiufeng Song
Xiao Guo
J. Zhang
Qirui Li
Lei Bai
Xiaoming Liu
Guangtao Zhai
Xiaohong Liu
DiffM
VGen
67
8
0
31 Oct 2024
Analyzing Multimodal Interaction Strategies for LLM-Assisted
  Manipulation of 3D Scenes
Analyzing Multimodal Interaction Strategies for LLM-Assisted Manipulation of 3D Scenes
Junlong Chen
Jens Grubert
Per Ola Kristensson
26
1
0
29 Oct 2024
A Hierarchical Language Model For Interpretable Graph Reasoning
A Hierarchical Language Model For Interpretable Graph Reasoning
Sambhav Khurana
Xiner Li
Shurui Gui
Shuiwang Ji
LRM
31
0
0
29 Oct 2024
CT2C-QA: Multimodal Question Answering over Chinese Text, Table and
  Chart
CT2C-QA: Multimodal Question Answering over Chinese Text, Table and Chart
Bowen Zhao
Tianhao Cheng
Yuejie Zhang
Ying Cheng
Rui Feng
Xiaobo Zhang
LMTD
18
1
0
28 Oct 2024
What Factors Affect Multi-Modal In-Context Learning? An In-Depth
  Exploration
What Factors Affect Multi-Modal In-Context Learning? An In-Depth Exploration
L. Qin
Qiguang Chen
Hao Fei
Zhi Chen
Min Li
Wanxiang Che
34
5
0
27 Oct 2024
Get Large Language Models Ready to Speak: A Late-fusion Approach for
  Speech Generation
Get Large Language Models Ready to Speak: A Late-fusion Approach for Speech Generation
Maohao Shen
Shun Zhang
Jilong Wu
Zhiping Xiu
Ehab AlBadawy
Yiting Lu
M. Seltzer
Qing He
33
2
0
27 Oct 2024
GiVE: Guiding Visual Encoder to Perceive Overlooked Information
GiVE: Guiding Visual Encoder to Perceive Overlooked Information
Junjie Li
Jianghong Ma
Xiaofeng Zhang
Yuhang Li
Jianyang Shi
23
0
0
26 Oct 2024
Watermarking Large Language Models and the Generated Content:
  Opportunities and Challenges
Watermarking Large Language Models and the Generated Content: Opportunities and Challenges
Ruisi Zhang
F. Koushanfar
WaLM
36
0
0
24 Oct 2024
Parameter-Efficient Fine-Tuning in Large Models: A Survey of Methodologies
Parameter-Efficient Fine-Tuning in Large Models: A Survey of Methodologies
L. Wang
Sheng Chen
Linnan Jiang
Shu Pan
Runze Cai
Sen Yang
Fei Yang
44
3
0
24 Oct 2024
Synergistic Dual Spatial-aware Generation of Image-to-Text and
  Text-to-Image
Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image
Yu Zhao
Hao Fei
Xiangtai Li
L. Qin
Jiayi Ji
Hongyuan Zhu
Meishan Zhang
M. Zhang
Jianguo Wei
DiffM
26
1
0
20 Oct 2024
Leveraging Large Language Models for Enhancing Public Transit Services
Leveraging Large Language Models for Enhancing Public Transit Services
Jiahao Wang
Amer Shalaby
22
0
0
18 Oct 2024
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation
Rongyao Fang
Chengqi Duan
Kun Wang
Hao Li
H. Tian
Xingyu Zeng
Rui Zhao
Jifeng Dai
Hongsheng Li
Xihui Liu
MLLM
34
11
0
17 Oct 2024
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding
  and Generation
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
Chengyue Wu
Xiaokang Chen
Z. F. Wu
Yiyang Ma
Xingchao Liu
...
Wen Liu
Zhenda Xie
Xingkai Yu
Chong Ruan
Ping Luo
AI4TS
49
70
0
17 Oct 2024
Roadmap towards Superhuman Speech Understanding using Large Language
  Models
Roadmap towards Superhuman Speech Understanding using Large Language Models
Fan Bu
Yuhao Zhang
X. Wang
Benyou Wang
Q. Liu
H. Li
LM&MA
ELM
AuLLM
36
1
0
17 Oct 2024
A Survey on Data Synthesis and Augmentation for Large Language Models
A Survey on Data Synthesis and Augmentation for Large Language Models
Ke Wang
Jiahui Zhu
Minjie Ren
Z. Liu
Shiwei Li
...
Chenkai Zhang
Xiaoyu Wu
Qiqi Zhan
Qingjie Liu
Yunhong Wang
SyDa
36
15
0
16 Oct 2024
Dual-Model Distillation for Efficient Action Classification with Hybrid
  Edge-Cloud Solution
Dual-Model Distillation for Efficient Action Classification with Hybrid Edge-Cloud Solution
Timothy Wei
Hsien Xin Peng
Elaine Xu
Bryan Zhao
Lei Ding
Diji Yang
18
0
0
16 Oct 2024
Previous
1234567
Next