Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2308.01390
Cited By
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
2 August 2023
Anas Awadalla
Irena Gao
Josh Gardner
Jack Hessel
Yusuf Hanafy
Wanrong Zhu
Kalyani Marathe
Yonatan Bitton
S. Gadre
Shiori Sagawa
J. Jitsev
Simon Kornblith
Pang Wei Koh
Gabriel Ilharco
Mitchell Wortsman
Ludwig Schmidt
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models"
50 / 335 papers shown
Title
FoodLMM: A Versatile Food Assistant using Large Multi-modal Model
Yuehao Yin
Huiyan Qi
B. Zhu
Jingjing Chen
Yu-Gang Jiang
Chong-Wah Ngo
13
18
0
22 Dec 2023
VCoder: Versatile Vision Encoders for Multimodal Large Language Models
Jitesh Jain
Jianwei Yang
Humphrey Shi
MLLM
11
24
0
21 Dec 2023
Silkie: Preference Distillation for Large Visual Language Models
Lei Li
Zhihui Xie
Mukai Li
Shunian Chen
Peiyi Wang
Liang Chen
Yazheng Yang
Benyou Wang
Lingpeng Kong
MLLM
101
68
0
17 Dec 2023
Lever LM: Configuring In-Context Sequence to Lever Large Vision Language Models
Xu Yang
Yingzhe Peng
Haoxuan Ma
Shuo Xu
Chi Zhang
Yucheng Han
Hanwang Zhang
30
5
0
15 Dec 2023
GlitchBench: Can large multimodal models detect video game glitches?
Mohammad Reza Taesiri
Tianjun Feng
Anh Nguyen
C. Bezemer
MLLM
VLM
LRM
30
9
0
08 Dec 2023
Towards Knowledge-driven Autonomous Driving
Xin Li
Yeqi Bai
Pinlong Cai
Licheng Wen
Daocheng Fu
...
Yikang Li
Botian Shi
Yong-Jin Liu
Liang He
Yu Qiao
32
26
0
07 Dec 2023
GPT4Point: A Unified Framework for Point-Language Understanding and Generation
Zhangyang Qi
Ye Fang
Zeyi Sun
Xiaoyang Wu
Tong Wu
Jiaqi Wang
Dahua Lin
Hengshuang Zhao
MLLM
71
35
0
05 Dec 2023
BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models
Rizhao Cai
Zirui Song
Dayan Guan
Zhenhao Chen
Xing Luo
Chenyu Yi
Alex C. Kot
MLLM
VLM
20
31
0
05 Dec 2023
Towards More Unified In-context Visual Understanding
Dianmo Sheng
Dongdong Chen
Zhentao Tan
Qiankun Liu
Qi Chu
Jianmin Bao
Tao Gong
Bin Liu
Shengwei Xu
Nenghai Yu
MLLM
VLM
24
10
0
05 Dec 2023
Object Recognition as Next Token Prediction
Kaiyu Yue
Borchun Chen
Jonas Geiping
Hengduo Li
Tom Goldstein
Ser-Nam Lim
22
8
0
04 Dec 2023
InstructTA: Instruction-Tuned Targeted Attack for Large Vision-Language Models
Xunguang Wang
Zhenlan Ji
Pingchuan Ma
Zongjie Li
Shuai Wang
MLLM
30
11
0
04 Dec 2023
How to Configure Good In-Context Sequence for Visual Question Answering
Li Li
Jiawei Peng
Huiyi Chen
Chongyang Gao
Xu Yang
MLLM
15
20
0
04 Dec 2023
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
M. Steyvers
Yuan Yao
Haoye Zhang
Taiwen He
Yifeng Han
...
Xinyue Hu
Zhiyuan Liu
Hai-Tao Zheng
Maosong Sun
Tat-Seng Chua
MLLM
VLM
130
177
0
01 Dec 2023
Dolphins: Multimodal Language Model for Driving
Yingzi Ma
Yulong Cao
Jiachen Sun
Marco Pavone
Chaowei Xiao
MLLM
21
49
0
01 Dec 2023
Merlin:Empowering Multimodal LLMs with Foresight Minds
En Yu
Liang Zhao
Yana Wei
Jinrong Yang
Dongming Wu
...
Haoran Wei
Tiancai Wang
Zheng Ge
Xiangyu Zhang
Wenbing Tao
LRM
10
25
0
30 Nov 2023
Understanding and Improving In-Context Learning on Vision-language Models
Shuo Chen
Zhen Han
Bailan He
Mark Buckley
Philip H. S. Torr
Volker Tresp
Jindong Gu
25
6
0
29 Nov 2023
DiG-IN: Diffusion Guidance for Investigating Networks -- Uncovering Classifier Differences Neuron Visualisations and Visual Counterfactual Explanations
Maximilian Augustin
Yannic Neuhaus
Matthias Hein
DiffM
19
3
0
29 Nov 2023
Text as Images: Can Multimodal Large Language Models Follow Printed Instructions in Pixels?
Xiujun Li
Yujie Lu
Zhe Gan
Jianfeng Gao
William Yang Wang
Yejin Choi
VLM
MLLM
28
1
0
29 Nov 2023
MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models
Xin Liu
Yichen Zhu
Jindong Gu
Yunshi Lan
Chao Yang
Yu Qiao
19
80
0
29 Nov 2023
SEED-Bench-2: Benchmarking Multimodal Large Language Models
Bohao Li
Yuying Ge
Yixiao Ge
Guangzhi Wang
Rui Wang
Ruimao Zhang
Ying Shan
MLLM
VLM
23
67
0
28 Nov 2023
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models
Munan Ning
Bin Zhu
Yujia Xie
Bin Lin
Jiaxi Cui
Lu Yuan
Dongdong Chen
Li-ming Yuan
ELM
MLLM
23
58
0
27 Nov 2023
Visual cognition in multimodal large language models
Luca M. Schulze Buschoff
Elif Akata
Matthias Bethge
Eric Schulz
LRM
49
12
0
27 Nov 2023
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Xiang Yue
Yuansheng Ni
Kai Zhang
Tianyu Zheng
Ruoqi Liu
...
Yibo Liu
Wenhao Huang
Huan Sun
Yu-Chuan Su
Wenhu Chen
OSLM
ELM
VLM
66
729
0
27 Nov 2023
EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models
Sijie Cheng
Zhicheng Guo
Jingwen Wu
Kechen Fang
Peng Li
Huaping Liu
Yang Janet Liu
EgoV
LRM
21
15
0
27 Nov 2023
Robot Learning in the Era of Foundation Models: A Survey
Xuan Xiao
Jiahang Liu
Zhipeng Wang
Yanmin Zhou
Yong Qi
Qian Cheng
Bin He
Shuo Jiang
AI4CE
LM&Ro
16
26
0
24 Nov 2023
MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria
Wentao Ge
Shunian Chen
Guiming Hardy Chen
Zhihong Chen
Junying Chen
...
Anningzhe Gao
Zhiyi Zhang
Jianquan Li
Xiang Wan
Benyou Wang
MLLM
44
6
0
23 Nov 2023
MAIRA-1: A specialised large multimodal model for radiology report generation
Stephanie L. Hyland
Shruthi Bannur
Kenza Bouzid
Daniel Coelho De Castro
M. Ranjit
...
Noel Codella
M. Lungren
Maria T. A. Wetscherek
Ozan Oktay
Javier Alvarez-Valle
29
47
0
22 Nov 2023
Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs
Yonghui Wang
Wen-gang Zhou
Hao Feng
Keyi Zhou
Houqiang Li
50
18
0
22 Nov 2023
InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models
Xiaotian Han
Quanzeng You
Yongfei Liu
Wentao Chen
Huangjie Zheng
...
Yiqi Wang
Bohan Zhai
Jianbo Yuan
Heng Wang
Hongxia Yang
ReLM
LRM
ELM
45
9
0
20 Nov 2023
Vision-Language Instruction Tuning: A Review and Analysis
Chen Li
Yixiao Ge
Dian Li
Ying Shan
VLM
31
12
0
14 Nov 2023
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Peng Jin
Ryuichi Takanobu
Caiwan Zhang
Xiaochun Cao
Li-ming Yuan
MLLM
34
222
0
14 Nov 2023
What Large Language Models Bring to Text-rich VQA?
Xuejing Liu
Wei Tang
Xinzhe Ni
Jinghui Lu
Rui Zhao
Zechao Li
Fei Tan
17
9
0
13 Nov 2023
Detecting and Correcting Hate Speech in Multimodal Memes with Large Visual Language Model
Minh-Hao Van
Xintao Wu
VLM
MLLM
23
10
0
12 Nov 2023
Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models
Zhang Li
Biao Yang
Qiang Liu
Zhiyin Ma
Shuo Zhang
Jingxu Yang
Yabo Sun
Yuliang Liu
Xiang Bai
MLLM
36
241
0
11 Nov 2023
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model
Shezheng Song
Xiaopeng Li
Shasha Li
Shan Zhao
Jie Yu
Jun Ma
Xiaoguang Mao
Weimin Zhang
66
4
0
10 Nov 2023
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
Qinghao Ye
Haiyang Xu
Jiabo Ye
Mingshi Yan
Anwen Hu
Haowei Liu
Qi Qian
Ji Zhang
Fei Huang
Jingren Zhou
MLLM
VLM
116
375
0
07 Nov 2023
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding
Junyan Li
Delin Chen
Yining Hong
Zhenfang Chen
Peihao Chen
Yikang Shen
Chuang Gan
MLLM
13
14
0
06 Nov 2023
CogVLM: Visual Expert for Pretrained Language Models
Weihan Wang
Qingsong Lv
Wenmeng Yu
Wenyi Hong
Ji Qi
...
Bin Xu
Juanzi Li
Yuxiao Dong
Ming Ding
Jie Tang
VLM
MLLM
17
445
0
06 Nov 2023
Large Language Models Illuminate a Progressive Pathway to Artificial Healthcare Assistant: A Review
Mingze Yuan
Peng Bao
Jiajia Yuan
Yunhao Shen
Zi Chen
...
Jie Zhao
Yang Chen
Li Zhang
Lin Shen
Bin Dong
ELM
LM&MA
41
13
0
03 Nov 2023
Vision-Language Foundation Models as Effective Robot Imitators
Xinghang Li
Minghuan Liu
Hanbo Zhang
Cunjun Yu
Jie Xu
...
Ya Jing
Weinan Zhang
Huaping Liu
Hang Li
Tao Kong
LM&Ro
21
134
0
02 Nov 2023
De-Diffusion Makes Text a Strong Cross-Modal Interface
Chen Wei
Chenxi Liu
Siyuan Qiao
Zhishuai Zhang
Alan Yuille
Jiahui Yu
VLM
DiffM
29
10
0
01 Nov 2023
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Lianghui Zhu
Xinggang Wang
Xinlong Wang
ELM
ALM
54
106
0
26 Oct 2023
Exploring Question Decomposition for Zero-Shot VQA
Zaid Khan
B. Vijaykumar
S. Schulter
Manmohan Chandraker
Yun Fu
ReLM
17
10
0
25 Oct 2023
What's Left? Concept Grounding with Logic-Enhanced Foundation Models
Joy Hsu
Jiayuan Mao
Joshua B. Tenenbaum
Jiajun Wu
VLM
ReLM
LRM
18
21
0
24 Oct 2023
Benchmarking Sequential Visual Input Reasoning and Prediction in Multimodal Large Language Models
Mingwei Zhu
Leigang Sha
Yu Shu
Kangjia Zhao
Tiancheng Zhao
Jianwei Yin
LRM
22
0
0
20 Oct 2023
Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in Open Worlds
Sipeng Zheng
Jiazheng Liu
Yicheng Feng
Zongqing Lu
34
29
0
20 Oct 2023
LoHoRavens: A Long-Horizon Language-Conditioned Benchmark for Robotic Tabletop Manipulation
Shengqiang Zhang
Philipp Wicke
Lutfi Kerem Senel
Luis F. C. Figueredo
Abdeldjallil Naceri
Sami Haddadin
Barbara Plank
Hinrich Schütze
LM&Ro
23
10
0
18 Oct 2023
DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
Jinbo Xing
Menghan Xia
Yong Zhang
Haoxin Chen
Wangbo Yu
Hanyuan Liu
Xintao Wang
Tien-Tsin Wong
Ying Shan
VGen
28
199
0
18 Oct 2023
Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks
Erfan Shayegani
Md Abdullah Al Mamun
Yu Fu
Pedram Zaree
Yue Dong
Nael B. Abu-Ghazaleh
AAML
147
144
0
16 Oct 2023
Octopus: Embodied Vision-Language Programmer from Environmental Feedback
Jingkang Yang
Yuhao Dong
Shuai Liu
Bo-wen Li
Ziyue Wang
...
Haoran Tan
Jiamu Kang
Yuanhan Zhang
Kaiyang Zhou
Ziwei Liu
LM&Ro
39
45
0
12 Oct 2023
Previous
1
2
3
4
5
6
7
Next