ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.02239
  4. Cited By
MiniGPT-5: Interleaved Vision-and-Language Generation via Generative
  Vokens

MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens

3 October 2023
Kaizhi Zheng
Xuehai He
Xin Eric Wang
    MLLM
ArXivPDFHTML

Papers citing "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"

50 / 74 papers shown
Title
VIST-GPT: Ushering in the Era of Visual Storytelling with LLMs?
VIST-GPT: Ushering in the Era of Visual Storytelling with LLMs?
Mohamed Gado
Towhid Taliee
Muhammad Memon
D. Ignatov
Radu Timofte
63
0
0
27 Apr 2025
A Survey of Foundation Model-Powered Recommender Systems: From Feature-Based, Generative to Agentic Paradigms
A Survey of Foundation Model-Powered Recommender Systems: From Feature-Based, Generative to Agentic Paradigms
Chengkai Huang
Hongtao Huang
Tong Yu
Kaige Xie
Junda Wu
Shuai Zhang
Julian McAuley
Dietmar Jannach
Lina Yao
LRM
AI4CE
22
0
0
23 Apr 2025
MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
Wulin Xie
Y. Zhang
Chaoyou Fu
Yang Shi
Bingyan Nie
Hongkai Chen
Z. Zhang
Liang Wang
T. Tan
31
1
0
04 Apr 2025
Towards a Multimodal Document-grounded Conversational AI System for Education
Towards a Multimodal Document-grounded Conversational AI System for Education
Karan Taneja
Anjali Singh
Ashok K. Goel
22
0
0
04 Apr 2025
On Large Multimodal Models as Open-World Image Classifiers
On Large Multimodal Models as Open-World Image Classifiers
Alessandro Conti
Massimiliano Mancini
Enrico Fini
Yiming Wang
Paolo Rota
Elisa Ricci
VLM
Presented at ResearchTrend Connect | VLM on 07 May 2025
74
0
0
27 Mar 2025
D2C: Unlocking the Potential of Continuous Autoregressive Image Generation with Discrete Tokens
D2C: Unlocking the Potential of Continuous Autoregressive Image Generation with Discrete Tokens
Panpan Wang
Liqiang Niu
Fandong Meng
Jinan Xu
Yufeng Chen
Jie Zhou
DiffM
45
0
0
21 Mar 2025
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Y. Wang
Shengqiong Wu
Y. Zhang
William Yang Wang
Ziwei Liu
Jiebo Luo
Hao Fei
LRM
84
7
0
16 Mar 2025
BRIDGE: Bootstrapping Text to Control Time-Series Generation via Multi-Agent Iterative Optimization and Diffusion Modelling
BRIDGE: Bootstrapping Text to Control Time-Series Generation via Multi-Agent Iterative Optimization and Diffusion Modelling
Hao Li
Yu Huang
Chang Xu
Viktor Schlegel
Ren-He Jiang
R. Batista-Navarro
Goran Nenadic
Jiang Bian
DiffM
AI4CE
88
3
0
04 Mar 2025
Language Model Mapping in Multimodal Music Learning: A Grand Challenge Proposal
Daniel Y. Chin
Gus Xia
34
0
0
01 Mar 2025
Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think
Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think
L. Chen
S. Bai
Wenhao Chai
Weichu Xie
Haozhe Zhao
Leon Vinci
Junyang Lin
Baobao Chang
DiffM
82
4
0
27 Feb 2025
Visual Large Language Models for Generalized and Specialized Applications
Yifan Li
Zhixin Lai
Wentao Bao
Zhen Tan
Anh Dao
Kewei Sui
Jiayi Shen
Dong Liu
Huan Liu
Yu Kong
VLM
83
10
0
06 Jan 2025
Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
Hao Fei
Shengqiong Wu
H. Zhang
Tat-Seng Chua
Shuicheng Yan
59
35
0
31 Dec 2024
An Empirical Analysis on Spatial Reasoning Capabilities of Large
  Multimodal Models
An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal Models
Fatemeh Shiri
Xiao-Yu Guo
Mona Golestan Far
Xin-Yao Yu
Gholamreza Haffari
Yuan-Fang Li
LRM
20
8
0
09 Nov 2024
MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained
  Vision-Language Understanding
MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding
Yue Cao
Yangzhou Liu
Zhe Chen
Guangchen Shi
Wenhai Wang
Danhuai Zhao
Tong Lu
41
5
0
15 Oct 2024
Free Video-LLM: Prompt-guided Visual Perception for Efficient
  Training-free Video LLMs
Free Video-LLM: Prompt-guided Visual Perception for Efficient Training-free Video LLMs
Kai Han
Jianyuan Guo
Yehui Tang
W. He
Enhua Wu
Yunhe Wang
MLLM
VLM
21
3
0
14 Oct 2024
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
Peng Xia
Siwei Han
Shi Qiu
Yiyang Zhou
Zhaoyang Wang
...
Chenhang Cui
Mingyu Ding
Linjie Li
Lijuan Wang
Huaxiu Yao
49
10
0
14 Oct 2024
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal
  Reasoning with Large Language Models
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models
Shengsheng Qian
Zuyi Zhou
Dizhan Xue
Bing Wang
Changsheng Xu
LRM
34
1
0
19 Sep 2024
Multi-modal Speech Transformer Decoders: When Do Multiple Modalities Improve Accuracy?
Multi-modal Speech Transformer Decoders: When Do Multiple Modalities Improve Accuracy?
Yiwen Guan
V. Trinh
Vivek Voleti
Jacob Whitehill
34
1
0
13 Sep 2024
SITransformer: Shared Information-Guided Transformer for Extreme
  Multimodal Summarization
SITransformer: Shared Information-Guided Transformer for Extreme Multimodal Summarization
Sicheng Liu
Lintao Wang
Xiaogan Zhu
Xuequan Lu
Zhiyong Wang
Kun Hu
37
0
0
28 Aug 2024
BI-MDRG: Bridging Image History in Multimodal Dialogue Response
  Generation
BI-MDRG: Bridging Image History in Multimodal Dialogue Response Generation
Hee Suk Yoon
Eunseop Yoon
Joshua Tian Jin Tee
Kang Zhang
Yu-Jung Heo
Du-Seong Chang
Chang D. Yoo
24
3
0
12 Aug 2024
Openstory++: A Large-scale Dataset and Benchmark for Instance-aware
  Open-domain Visual Storytelling
Openstory++: A Large-scale Dataset and Benchmark for Instance-aware Open-domain Visual Storytelling
Zilyu Ye
Jinxiu Liu
Ruotian Peng
Jinjin Cao
Zhiyang Chen
...
Mingyuan Zhou
Xiaoqian Shen
Mohamed Elhoseiny
Qi Liu
Guo-Jun Qi
VGen
VLM
26
1
0
07 Aug 2024
Piculet: Specialized Models-Guided Hallucination Decrease for MultiModal
  Large Language Models
Piculet: Specialized Models-Guided Hallucination Decrease for MultiModal Large Language Models
Afia Anjum
Xiang Liu
Zhaoxiang Liu
Kai Wang
Shiguo Lian
VLM
MLLM
31
0
0
02 Aug 2024
Harmonizing Visual Text Comprehension and Generation
Harmonizing Visual Text Comprehension and Generation
Zhen Zhao
Jingqun Tang
Binghong Wu
Chunhui Lin
Shubo Wei
Hao Liu
Xin Tan
Zhizhong Zhang
Can Huang
Yuan Xie
VLM
26
21
0
23 Jul 2024
Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation
  for Video Moment Retrieval
Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval
Yiyang Jiang
Wengyu Zhang
Xu-Lu Zhang
Xiaoyong Wei
Chang Wen Chen
Qing Li
29
3
0
21 Jul 2024
F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions
F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions
Jie-jin Yang
Xuesong Niu
Nan Jiang
Ruimao Zhang
Siyuan Huang
25
9
0
17 Jul 2024
Stark: Social Long-Term Multi-Modal Conversation with Persona
  Commonsense Knowledge
Stark: Social Long-Term Multi-Modal Conversation with Persona Commonsense Knowledge
Young-Jun Lee
Dokyong Lee
Junyoung Youn
Kyeongjin Oh
ByungSoo Ko
Jonghwan Hyeon
Ho-Jin Choi
23
2
0
04 Jul 2024
Lateralization LoRA: Interleaved Instruction Tuning with
  Modality-Specialized Adaptations
Lateralization LoRA: Interleaved Instruction Tuning with Modality-Specialized Adaptations
Zhiyang Xu
Minqian Liu
Ying Shen
Joy Rimchala
Jiaxin Zhang
Qifan Wang
Yu Cheng
Lifu Huang
VLM
37
2
0
04 Jul 2024
Holistic Evaluation for Interleaved Text-and-Image Generation
Holistic Evaluation for Interleaved Text-and-Image Generation
Minqian Liu
Zhiyang Xu
Zihao Lin
Trevor Ashby
Joy Rimchala
Jiaxin Zhang
Lifu Huang
EGVM
36
7
0
20 Jun 2024
CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation
CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation
Wei Chen
Lin Li
Yongqi Yang
Bin Wen
Fan Yang
Tingting Gao
Yu Wu
Long Chen
VLM
VGen
43
6
0
15 Jun 2024
UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot
  Audio Task Learner
UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner
Dongchao Yang
Haohan Guo
Yuanyuan Wang
Rongjie Huang
Xiang Li
Xu Tan
Xixin Wu
Helen Meng
AuLLM
39
15
0
14 Jun 2024
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation
  in Videos
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos
Xuehai He
Weixi Feng
Kaizhi Zheng
Yujie Lu
Wanrong Zhu
...
Zhengyuan Yang
Kevin Lin
William Yang Wang
Lijuan Wang
Xin Eric Wang
VGen
LRM
33
12
0
12 Jun 2024
AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image
  Generation
AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation
Junhao Cheng
Xi Lu
Hanhui Li
Khun Loun Zai
Baiqiao Yin
Yuhao Cheng
Yiqiang Yan
Xiaodan Liang
DiffM
VGen
21
10
0
03 Jun 2024
Kaleido Diffusion: Improving Conditional Diffusion Models with
  Autoregressive Latent Modeling
Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling
Jiatao Gu
Ying Shen
Shuangfei Zhai
Yizhe Zhang
Navdeep Jaitly
J. Susskind
36
10
0
31 May 2024
Reasoning3D -- Grounding and Reasoning in 3D: Fine-Grained Zero-Shot
  Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language
  Models
Reasoning3D -- Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models
Tianrun Chen
Chunan Yu
Jing Li
Jianqi Zhang
Lanyun Zhu
Deyi Ji
Yong Zhang
Ying-Dong Zang
Zejian Li
Lingyun Sun
LRM
36
9
0
29 May 2024
A Survey of Multimodal Large Language Model from A Data-centric
  Perspective
A Survey of Multimodal Large Language Model from A Data-centric Perspective
Tianyi Bai
Hao Liang
Binwang Wan
Yanran Xu
Xi Li
...
Ping-Chia Huang
Jiulong Shan
Conghui He
Binhang Yuan
Wentao Zhang
47
36
0
26 May 2024
TheaterGen: Character Management with LLM for Consistent Multi-turn
  Image Generation
TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation
Junhao Cheng
Baiqiao Yin
Kaixin Cai
Minbin Huang
Hanhui Li
...
Yue Li
Yifei Li
Yuhao Cheng
Yiqiang Yan
Xiaodan Liang
DiffM
MLLM
29
12
0
29 Apr 2024
WorldGPT: Empowering LLM as Multimodal World Model
WorldGPT: Empowering LLM as Multimodal World Model
Zhiqi Ge
Hongzhe Huang
Mingze Zhou
Juncheng Li
Guoming Wang
Siliang Tang
Yueting Zhuang
32
26
0
28 Apr 2024
WavLLM: Towards Robust and Adaptive Speech Large Language Model
WavLLM: Towards Robust and Adaptive Speech Large Language Model
Shujie Hu
Long Zhou
Shujie Liu
Sanyuan Chen
Hongkun Hao
...
Xunying Liu
Jinyu Li
S. Sivasankaran
Linquan Liu
Furu Wei
AuLLM
21
42
0
31 Mar 2024
TexDreamer: Towards Zero-Shot High-Fidelity 3D Human Texture Generation
TexDreamer: Towards Zero-Shot High-Fidelity 3D Human Texture Generation
Yufei Liu
Junwei Zhu
Junshu Tang
Shijie Zhang
Jiangning Zhang
Weijian Cao
Chengjie Wang
Yunsheng Wu
Dongjin Huang
31
8
0
19 Mar 2024
VisualCritic: Making LMMs Perceive Visual Quality Like Humans
VisualCritic: Making LMMs Perceive Visual Quality Like Humans
Zhipeng Huang
Zhizheng Zhang
Yiting Lu
Zheng-Jun Zha
Zhibo Chen
Baining Guo
MLLM
42
11
0
19 Mar 2024
Let Storytelling Tell Vivid Stories: An Expressive and Fluent Multimodal
  Storyteller
Let Storytelling Tell Vivid Stories: An Expressive and Fluent Multimodal Storyteller
Chuanqi Zang
Jiji Tang
Rongsheng Zhang
Zeng Zhao
Tangjie Lv
Mingtao Pei
Wei Liang
24
3
0
12 Mar 2024
All in an Aggregated Image for In-Image Learning
All in an Aggregated Image for In-Image Learning
Lei Wang
Wanyu Xu
Zhiqiang Hu
Yihuai Lan
Shan Dong
Hao Wang
Roy Ka-Wei Lee
Ee-Peng Lim
VLM
43
1
0
28 Feb 2024
Evaluating Very Long-Term Conversational Memory of LLM Agents
Evaluating Very Long-Term Conversational Memory of LLM Agents
A. Maharana
Dong-Ho Lee
Sergey Tulyakov
Mohit Bansal
Francesco Barbieri
Yuwei Fang
LLMAG
17
66
0
27 Feb 2024
AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D
  Talking Face Generation
AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D Talking Face Generation
Yasheng Sun
Wenqing Chu
Hang Zhou
Kaisiyuan Wang
Hideki Koike
24
5
0
25 Feb 2024
Uncertainty-Aware Evaluation for Vision-Language Models
Uncertainty-Aware Evaluation for Vision-Language Models
Vasily Kostumov
Bulat Nutfullin
Oleg Pilipenko
Eugene Ilyushin
ELM
40
7
0
22 Feb 2024
Exploring the Frontier of Vision-Language Models: A Survey of Current
  Methodologies and Future Directions
Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions
Akash Ghosh
Arkadeep Acharya
Sriparna Saha
Vinija Jain
Aman Chadha
VLM
41
23
0
20 Feb 2024
The Revolution of Multimodal Large Language Models: A Survey
The Revolution of Multimodal Large Language Models: A Survey
Davide Caffagni
Federico Cocchi
Luca Barsellotti
Nicholas Moratelli
Sara Sarto
Lorenzo Baraldi
Lorenzo Baraldi
Marcella Cornia
Rita Cucchiara
LRM
VLM
46
41
0
19 Feb 2024
Can MLLMs Perform Text-to-Image In-Context Learning?
Can MLLMs Perform Text-to-Image In-Context Learning?
Yuchen Zeng
Wonjun Kang
Yicong Chen
Hyung Il Koo
Kangwook Lee
MLLM
23
9
0
02 Feb 2024
MM-LLMs: Recent Advances in MultiModal Large Language Models
MM-LLMs: Recent Advances in MultiModal Large Language Models
Duzhen Zhang
Yahan Yu
Jiahua Dong
Chenxing Li
Dan Su
Chenhui Chu
Dong Yu
OffRL
LRM
37
173
0
24 Jan 2024
STICKERCONV: Generating Multimodal Empathetic Responses from Scratch
STICKERCONV: Generating Multimodal Empathetic Responses from Scratch
Yiqun Zhang
Fanheng Kong
Peidong Wang
Shuang Sun
Lingshuai Wang
Shi Feng
Daling Wang
Yifei Zhang
Kaisong Song
15
10
0
20 Jan 2024
12
Next