Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.08583
Cited By
The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective
11 July 2024
Zhen Qin
Daoyuan Chen
Wenhao Zhang
Liuyi Yao
Yilun Huang
Bolin Ding
Yaliang Li
Shuiguang Deng
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective"
45 / 45 papers shown
Title
Data Efficient Evaluation of Large Language Models and Text-to-Image Models via Adaptive Sampling
Cong Xu
Gayathri Saranathan
Mahammad Parwez Alam
Arpit Shah
James Lim
Soon Yee Wong
Foltin Martin
Suparna Bhattacharya
VLM
27
1
0
21 Jun 2024
On LLMs-Driven Synthetic Data Generation, Curation, and Evaluation: A Survey
Lin Long
Rui Wang
Ruixuan Xiao
Junbo Zhao
Xiao Ding
Gang Chen
Haobo Wang
SyDa
45
88
0
14 Jun 2024
Multimodal Reasoning with Multimodal Knowledge Graph
Junlin Lee
Yequan Wang
Jing Li
Min Zhang
21
14
0
04 Jun 2024
Efficient Multimodal Large Language Models: A Survey
Yizhang Jin
Jian Li
Yexin Liu
Tianjun Gu
Kai Wu
...
Xin Tan
Zhenye Gan
Yabiao Wang
Chengjie Wang
Lizhuang Ma
LRM
36
44
0
17 May 2024
Language-Image Models with 3D Understanding
Jang Hyun Cho
B. Ivanovic
Yulong Cao
Edward Schmerling
Yue Wang
...
Boyi Li
Yurong You
Philipp Krahenbuhl
Yan Wang
Marco Pavone
LRM
32
3
0
06 May 2024
Hallucination of Multimodal Large Language Models: A Survey
Zechen Bai
Pichao Wang
Tianjun Xiao
Tong He
Zongbo Han
Zheng Zhang
Mike Zheng Shou
VLM
LRM
68
136
0
29 Apr 2024
What Makes Multimodal In-Context Learning Work?
Folco Bertini Baldassini
Mustafa Shukor
Matthieu Cord
Laure Soulier
Benjamin Piwowarski
19
4
0
24 Apr 2024
Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation
Xun Wu
Shaohan Huang
Furu Wei
29
8
0
23 Apr 2024
How Does the Textual Information Affect the Retrieval of Multimodal In-Context Learning?
Yang Luo
Zangwei Zheng
Zirui Zhu
Yang You
33
5
0
19 Apr 2024
MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale
Xiaotang Gai
Chenyi Zhou
Jiaxiang Liu
Yang Feng
Jian Wu
Zuo-Qiang Liu
MedIm
23
6
0
18 Apr 2024
Aligning Actions and Walking to LLM-Generated Textual Descriptions
Radu Chivereanu
Adrian Cosma
Andy Catruna
R. Rughinis
I. Radoi
41
2
0
18 Apr 2024
Fewer Truncations Improve Language Modeling
Hantian Ding
Zijian Wang
Giovanni Paolini
Varun Kumar
Anoop Deoras
Dan Roth
Stefano Soatto
48
13
0
16 Apr 2024
AesExpert: Towards Multi-modality Foundation Model for Image Aesthetics Perception
Yipo Huang
Xiangfei Sheng
Zhichao Yang
Quan Yuan
Zhichao Duan
Pengfei Chen
Leida Li
Weisi Lin
Guangming Shi
23
18
0
15 Apr 2024
Extract, Define, Canonicalize: An LLM-based Framework for Knowledge Graph Construction
Bowen Zhang
Harold Soh
24
2
0
05 Apr 2024
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Vishaal Udandarao
Ameya Prabhu
Adhiraj Ghosh
Yash Sharma
Philip H. S. Torr
Adel Bibi
Samuel Albanie
Matthias Bethge
VLM
98
43
0
04 Apr 2024
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Weifeng Lin
Xinyu Wei
Ruichuan An
Peng Gao
Bocheng Zou
Yulin Luo
Siyuan Huang
Shanghang Zhang
Hongsheng Li
VLM
40
31
0
29 Mar 2024
Improved Baselines for Data-efficient Perceptual Augmentation of LLMs
Théophane Vallaeys
Mustafa Shukor
Matthieu Cord
Jakob Verbeek
42
12
0
20 Mar 2024
Towards Multimodal In-Context Learning for Vision & Language Models
Sivan Doveh
Shaked Perek
M. Jehanzeb Mirza
Wei Lin
Amit Alfassy
Assaf Arbelle
S. Ullman
Leonid Karlinsky
VLM
96
13
0
19 Mar 2024
DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation
Xueqing Wu
Rui Zheng
Jingzhen Sha
Te-Lin Wu
Hanyu Zhou
Mohan Tang
Kai-Wei Chang
Nanyun Peng
Haoran Huang
39
1
0
04 Mar 2024
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Tsai-Shien Chen
Aliaksandr Siarohin
Willi Menapace
Ekaterina Deyneka
Hsiang-wei Chao
...
Yuwei Fang
Hsin-Ying Lee
Jian Ren
Ming-Hsuan Yang
Sergey Tulyakov
VGen
67
177
0
29 Feb 2024
All in an Aggregated Image for In-Image Learning
Lei Wang
Wanyu Xu
Zhiqiang Hu
Yihuai Lan
Shan Dong
Hao Wang
Roy Ka-Wei Lee
Ee-Peng Lim
VLM
35
1
0
28 Feb 2024
The Wolf Within: Covert Injection of Malice into MLLM Societies via an MLLM Operative
Zhen Tan
Chengshuai Zhao
Raha Moraffah
Yifan Li
Yu Kong
Tianlong Chen
Huan Liu
31
6
0
20 Feb 2024
On the Convergence of Zeroth-Order Federated Tuning for Large Language Models
Zhenqing Ling
Daoyuan Chen
Liuyi Yao
Yaliang Li
Ying Shen
FedML
35
12
0
08 Feb 2024
A Survey on Safe Multi-Modal Learning System
Tianyi Zhao
Liangliang Zhang
Yao Ma
Lu Cheng
44
7
0
08 Feb 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Chris Liu
Renrui Zhang
Longtian Qiu
Siyuan Huang
Weifeng Lin
...
Hao Shao
Pan Lu
Hongsheng Li
Yu Qiao
Peng Gao
MLLM
116
106
0
08 Feb 2024
When Large Language Models Meet Vector Databases: A Survey
Zhi Jing
Yongye Su
Yikun Han
Bo Yuan
Haiyun Xu
Chunjiang Liu
Kehai Chen
Min Zhang
31
32
0
30 Jan 2024
Red Teaming Visual Language Models
Mukai Li
Lei Li
Yuwei Yin
Masood Ahmed
Zhenguang Liu
Qi Liu
VLM
25
11
0
23 Jan 2024
Generative Multi-Modal Knowledge Retrieval with Large Language Models
Xinwei Long
Jiali Zeng
Fandong Meng
Zhiyuan Ma
Kaiyan Zhang
Bowen Zhou
Jie Zhou
35
6
0
16 Jan 2024
A Survey of Resource-efficient LLM and Multimodal Foundation Models
Mengwei Xu
Wangsong Yin
Dongqi Cai
Rongjie Yi
Daliang Xu
...
Shangguang Wang
Yuanchun Li
Yunxin Liu
Xin Jin
Xuanzhe Liu
VLM
64
70
0
16 Jan 2024
Aligned with LLM: a new multi-modal training paradigm for encoding fMRI activity in visual cortex
Shuxiao Ma
Linyuan Wang
Senbao Hou
Bin Yan
MLLM
22
1
0
08 Jan 2024
Multimodal Data Curation via Object Detection and Filter Ensembles
Tzu-Heng Huang
Changho Shin
Sui Jiet Tay
Dyah Adila
Frederic Sala
26
2
0
05 Jan 2024
Silkie: Preference Distillation for Large Visual Language Models
Lei Li
Zhihui Xie
Mukai Li
Shunian Chen
Peiyi Wang
Liang Chen
Yazheng Yang
Benyou Wang
Lingpeng Kong
MLLM
96
67
0
17 Dec 2023
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
M. Steyvers
Yuan Yao
Haoye Zhang
Taiwen He
Yifeng Han
...
Xinyue Hu
Zhiyuan Liu
Hai-Tao Zheng
Maosong Sun
Tat-Seng Chua
MLLM
VLM
125
176
0
01 Dec 2023
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
A. Blattmann
Tim Dockhorn
Sumith Kulal
Daniel Mendelevitch
Maciej Kilian
...
Zion English
Vikram S. Voleti
Adam Letts
Varun Jampani
Robin Rombach
VGen
150
985
0
25 Nov 2023
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
Qinghao Ye
Haiyang Xu
Jiabo Ye
Mingshi Yan
Anwen Hu
Haowei Liu
Qi Qian
Ji Zhang
Fei Huang
Jingren Zhou
MLLM
VLM
114
367
0
07 Nov 2023
LMDX: Language Model-based Document Information Extraction and Localization
Vincent Perot
Kai Kang
Florian Luisier
Guolong Su
Xiaoyu Sun
...
Zifeng Wang
Jiaqi Mu
Hao Zhang
Chen-Yu Lee
Nan Hua
40
29
0
19 Sep 2023
Multimodal Foundation Models: From Specialists to General-Purpose Assistants
Chunyuan Li
Zhe Gan
Zhengyuan Yang
Jianwei Yang
Linjie Li
Lijuan Wang
Jianfeng Gao
MLLM
105
221
0
18 Sep 2023
On the Adversarial Robustness of Multi-Modal Foundation Models
Christian Schlarmann
Matthias Hein
AAML
90
45
0
21 Aug 2023
Perception Test: A Diagnostic Benchmark for Multimodal Video Models
Viorica Puatruaucean
Lucas Smaira
Ankush Gupta
Adrià Recasens Continente
L. Markeeva
...
Y. Aytar
Simon Osindero
Dima Damen
Andrew Zisserman
João Carreira
VLM
102
138
0
23 May 2023
ChatGPT as your Personal Data Scientist
Md. Mahadi Hassan
Alex Knipper
Shubhra (Santu) Karmaker
LM&MA
LLMAG
AI4CE
27
9
0
23 May 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
Audio Retrieval with WavText5K and CLAP Training
Soham Deshmukh
Benjamin Elizalde
Huaming Wang
3DV
CLIP
105
50
0
28 Sep 2022
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
A. Kalyan
ELM
ReLM
LRM
198
1,089
0
20 Sep 2022
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
2,875
0
11 Feb 2021
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
220
3,054
0
23 Jan 2020
1