ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.01390
  4. Cited By
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive
  Vision-Language Models

OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models

2 August 2023
Anas Awadalla
Irena Gao
Josh Gardner
Jack Hessel
Yusuf Hanafy
Wanrong Zhu
Kalyani Marathe
Yonatan Bitton
S. Gadre
Shiori Sagawa
J. Jitsev
Simon Kornblith
Pang Wei Koh
Gabriel Ilharco
Mitchell Wortsman
Ludwig Schmidt
    MLLM
ArXivPDFHTML

Papers citing "OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models"

50 / 335 papers shown
Title
X-Transfer Attacks: Towards Super Transferable Adversarial Attacks on CLIP
X-Transfer Attacks: Towards Super Transferable Adversarial Attacks on CLIP
Hanxun Huang
Sarah Monazam Erfani
Yige Li
Xingjun Ma
James Bailey
AAML
34
0
0
08 May 2025
RESAnything: Attribute Prompting for Arbitrary Referring Segmentation
RESAnything: Attribute Prompting for Arbitrary Referring Segmentation
Ruiqi Wang
Hao Zhang
VLM
52
0
0
03 May 2025
Detecting and Mitigating Hateful Content in Multimodal Memes with Vision-Language Models
Detecting and Mitigating Hateful Content in Multimodal Memes with Vision-Language Models
Minh-Hao Van
Xintao Wu
VLM
81
0
0
30 Apr 2025
Accelerating LLM Inference with Flexible N:M Sparsity via A Fully Digital Compute-in-Memory Accelerator
Accelerating LLM Inference with Flexible N:M Sparsity via A Fully Digital Compute-in-Memory Accelerator
Akshat Ramachandran
Souvik Kundu
Arnab Raha
Shamik Kundu
Deepak K. Mathaikutty
Tushar Krishna
27
1
0
19 Apr 2025
A0: An Affordance-Aware Hierarchical Model for General Robotic Manipulation
A0: An Affordance-Aware Hierarchical Model for General Robotic Manipulation
Rongtao Xu
J. Zhang
Minghao Guo
Youpeng Wen
H. Yang
...
Liqiong Wang
Yuxuan Kuang
Meng Cao
Feng Zheng
Xiaodan Liang
40
3
0
17 Apr 2025
Evaluating Menu OCR and Translation: A Benchmark for Aligning Human and Automated Evaluations in Large Vision-Language Models
Evaluating Menu OCR and Translation: A Benchmark for Aligning Human and Automated Evaluations in Large Vision-Language Models
Zhanglin Wu
Tengfei Song
Ning Xie
Weidong Zhang
Mengli Zhu
...
Pengfei Li
C. Li
Junhao Zhu
Hao-Yu Yang
Shiliang Sun
26
1
0
16 Apr 2025
LangPert: Detecting and Handling Task-level Perturbations for Robust Object Rearrangement
LangPert: Detecting and Handling Task-level Perturbations for Robust Object Rearrangement
Xu Yin
Min-Sung Yoon
Yuchi Huo
Kang Zhang
Sung-eui Yoon
26
0
0
14 Apr 2025
SilVar-Med: A Speech-Driven Visual Language Model for Explainable Abnormality Detection in Medical Imaging
SilVar-Med: A Speech-Driven Visual Language Model for Explainable Abnormality Detection in Medical Imaging
Tan-Hanh Pham
Chris Ngo
Trong-Duong Bui
Minh Luu Quang
Tan-Huong Pham
Truong Son-Hy
27
0
0
14 Apr 2025
COUNTS: Benchmarking Object Detectors and Multimodal Large Language Models under Distribution Shifts
COUNTS: Benchmarking Object Detectors and Multimodal Large Language Models under Distribution Shifts
Jiansheng Li
Xingxuan Zhang
Hao Zou
Yige Guo
Renzhe Xu
Yilong Liu
Chuzhao Zhu
Yue He
Peng Cui
VLM
37
0
0
14 Apr 2025
Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language Models
Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language Models
Yuxiang Lin
Jingdong Sun
Zhi-Qi Cheng
Jue Wang
Haomin Liang
Zebang Cheng
Yifei Dong
Jun-Yan He
Xiaojiang Peng
Xian-Sheng Hua
41
0
0
10 Apr 2025
Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program
Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program
Minghe Gao
Xuqi Liu
Zhongqi Yue
Y. Wu
Shuang Chen
Juncheng Billy Li
Siliang Tang
Fei Wu
Tat-Seng Chua
Yueting Zhuang
OffRL
LRM
39
1
0
09 Apr 2025
Classifying the Unknown: In-Context Learning for Open-Vocabulary Text and Symbol Recognition
Classifying the Unknown: In-Context Learning for Open-Vocabulary Text and Symbol Recognition
Tom Simon
William Mocaer
Pierrick Tranouez
Clément Chatelain
Thierry Paquet
MLLM
VLM
51
0
0
09 Apr 2025
Mind the Trojan Horse: Image Prompt Adapter Enabling Scalable and Deceptive Jailbreaking
Mind the Trojan Horse: Image Prompt Adapter Enabling Scalable and Deceptive Jailbreaking
Junxi Chen
Junhao Dong
Xiaohua Xie
33
0
0
08 Apr 2025
M2IV: Towards Efficient and Fine-grained Multimodal In-Context Learning in Large Vision-Language Models
M2IV: Towards Efficient and Fine-grained Multimodal In-Context Learning in Large Vision-Language Models
Yanshu Li
Hongyang He
Yi Cao
Qisen Cheng
Xiang Fu
Ruixiang Tang
VLM
40
0
0
06 Apr 2025
Large (Vision) Language Models are Unsupervised In-Context Learners
Large (Vision) Language Models are Unsupervised In-Context Learners
Artyom Gadetsky
Andrei Atanov
Yulun Jiang
Zhitong Gao
Ghazal Hosseini Mighan
Amir Zamir
Maria Brbić
VLM
MLLM
LRM
64
0
0
03 Apr 2025
AdPO: Enhancing the Adversarial Robustness of Large Vision-Language Models with Preference Optimization
AdPO: Enhancing the Adversarial Robustness of Large Vision-Language Models with Preference Optimization
Chaohu Liu
Tianyi Gui
Yu Liu
Linli Xu
VLM
AAML
68
1
0
02 Apr 2025
RoboAct-CLIP: Video-Driven Pre-training of Atomic Action Understanding for Robotics
RoboAct-CLIP: Video-Driven Pre-training of Atomic Action Understanding for Robotics
Zhiyuan Zhang
Yuxin He
Yong Sun
Junyu Shi
Lijiang Liu
Qiang Nie
VLM
44
0
0
02 Apr 2025
ANNEXE: Unified Analyzing, Answering, and Pixel Grounding for Egocentric Interaction
ANNEXE: Unified Analyzing, Answering, and Pixel Grounding for Egocentric Interaction
Yuejiao Su
Yi Wang
Qiongyang Hu
Chuang Yang
Lap-Pui Chau
45
0
0
02 Apr 2025
Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems?
Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems?
Kai Yan
Yufei Xu
Zhengyin Du
Xuesong Yao
Z. Wang
Xiaowen Guo
Jiecao Chen
ReLM
ELM
LRM
92
3
0
01 Apr 2025
Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features
Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features
Jewon Lee
Ki-Ung Song
Seungmin Yang
Donguk Lim
Jaeyeon Kim
Wooksu Shin
Bo-Kyeong Kim
Yong Jae Lee
Tae-Ho Kim
VLM
55
0
0
01 Apr 2025
FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs
FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs
Xiaoqin Wang
Xusen Ma
Xianxu Hou
Meidan Ding
Yudong Li
Junliang Chen
Wenting Chen
Xiaoyang Peng
LinLin Shen
CVBM
71
0
0
27 Mar 2025
Fine-Grained Evaluation of Large Vision-Language Models in Autonomous Driving
Fine-Grained Evaluation of Large Vision-Language Models in Autonomous Driving
Yue Li
Meng Tian
Zhenyu Lin
Jiangtong Zhu
Dechang Zhu
Haiqiang Liu
Zining Wang
Yueyi Zhang
Zhiwei Xiong
Xinhai Zhao
CoGe
VLM
80
1
0
27 Mar 2025
Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector
Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector
Xiao Guo
Xiufeng Song
Yue Zhang
Xiaohong Liu
X. Liu
56
1
0
26 Mar 2025
RoboFlamingo-Plus: Fusion of Depth and RGB Perception with Vision-Language Models for Enhanced Robotic Manipulation
RoboFlamingo-Plus: Fusion of Depth and RGB Perception with Vision-Language Models for Enhanced Robotic Manipulation
Sheng Wang
VLM
76
2
0
25 Mar 2025
OCRT: Boosting Foundation Models in the Open World with Object-Concept-Relation Triad
OCRT: Boosting Foundation Models in the Open World with Object-Concept-Relation Triad
Luyao Tang
Yuxuan Yuan
C. L. P. Chen
Zeyu Zhang
Yue Huang
Kun Zhang
48
0
0
24 Mar 2025
P3Nav: A Unified Framework for Embodied Navigation Integrating Perception, Planning, and Prediction
P3Nav: A Unified Framework for Embodied Navigation Integrating Perception, Planning, and Prediction
Yufeng Zhong
Chengjian Feng
Feng Yan
Fanfan Liu
Liming Zheng
Lin Ma
49
0
0
24 Mar 2025
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding
Wenxuan Zhu
Bing Li
Cheng Zheng
Jinjie Mai
Jun-Cheng Chen
...
Abdullah Hamdi
Sara Rojas Martinez
Chia-Wen Lin
Mohamed Elhoseiny
Bernard Ghanem
VLM
48
0
0
22 Mar 2025
Leveraging OpenFlamingo for Multimodal Embedding Analysis of C2C Car Parts Data
Leveraging OpenFlamingo for Multimodal Embedding Analysis of C2C Car Parts Data
Maisha Binte Rashid
Pablo Rivas
34
0
0
20 Mar 2025
BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models
BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models
Zenghui Yuan
Jiawen Shi
Pan Zhou
Neil Zhenqiang Gong
Lichao Sun
AAML
66
1
0
20 Mar 2025
MathFlow: Enhancing the Perceptual Flow of MLLMs for Visual Mathematical Problems
MathFlow: Enhancing the Perceptual Flow of MLLMs for Visual Mathematical Problems
Felix Chen
Hangjie Yuan
Yunqiu Xu
Tao Feng
Jun Cen
Pengwei Liu
Zeying Huang
Yi Yang
LRM
42
1
0
19 Mar 2025
Identifying and Mitigating Position Bias of Multi-image Vision-Language Models
Identifying and Mitigating Position Bias of Multi-image Vision-Language Models
Xinyu Tian
Shu Zou
Zhaoyuan Yang
Jing Zhang
58
0
0
18 Mar 2025
MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs
MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs
Erik Daxberger
Nina Wenzel
David Griffiths
Haiming Gang
Justin Lazarow
...
Kai Kang
Marcin Eichner
Y. Yang
Afshin Dehghan
Peter Grasch
72
2
0
17 Mar 2025
PASTA: Part-Aware Sketch-to-3D Shape Generation with Text-Aligned Prior
PASTA: Part-Aware Sketch-to-3D Shape Generation with Text-Aligned Prior
S. Lee
Hwanhee Jung
Byoungsoo Koh
Qixing Huang
Sangho Yoon
Sangpil Kim
44
0
0
17 Mar 2025
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Y. Wang
Shengqiong Wu
Y. Zhang
William Yang Wang
Ziwei Liu
Jiebo Luo
Hao Fei
LRM
87
7
0
16 Mar 2025
Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-tuning
Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-tuning
Yiwei Chen
Yuguang Yao
Yihua Zhang
Bingquan Shen
Gaowen Liu
Sijia Liu
AAML
MU
58
1
0
14 Mar 2025
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers
Weiming Ren
Wentao Ma
Huan Yang
Cong Wei
Ge Zhang
Wenhu Chen
Mamba
57
3
0
14 Mar 2025
Towards Fast, Memory-based and Data-Efficient Vision-Language Policy
Haoxuan Li
Sixu Yan
Y. Li
Xinggang Wang
LM&Ro
59
0
0
13 Mar 2025
Teaching LMMs for Image Quality Scoring and Interpreting
Zicheng Zhang
H. Wu
Ziheng Jia
Weisi Lin
Guangtao Zhai
60
1
0
12 Mar 2025
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding
Shehreen Azad
Vibhav Vineet
Y. S. Rawat
VLM
84
1
0
11 Mar 2025
Evaluation of Safety Cognition Capability in Vision-Language Models for Autonomous Driving
Enming Zhang
Peizhe Gong
Xingyuan Dai
Yisheng Lv
Q. Miao
MLLM
ELM
60
0
0
09 Mar 2025
GenieBlue: Integrating both Linguistic and Multimodal Capabilities for Large Language Models on Mobile Devices
Xudong Lu
Yinghao Chen
Renshou Wu
Haohao Gao
Xi Chen
...
Fangyuan Li
Yafei Wen
Xiaoxin Chen
Shuai Ren
Hongsheng Li
68
0
0
08 Mar 2025
Adversarial Training for Multimodal Large Language Models against Jailbreak Attacks
Adversarial Training for Multimodal Large Language Models against Jailbreak Attacks
Liming Lu
Shuchao Pang
Siyuan Liang
Haotian Zhu
Xiyu Zeng
Aishan Liu
Yunhuai Liu
Yongbin Zhou
AAML
49
1
0
05 Mar 2025
Advancing Multimodal In-Context Learning in Large Vision-Language Models with Task-aware Demonstrations
Advancing Multimodal In-Context Learning in Large Vision-Language Models with Task-aware Demonstrations
Yanshu Li
44
0
0
05 Mar 2025
Accelerating Vision-Language-Action Model Integrated with Action Chunking via Parallel Decoding
Wenxuan Song
Jiayi Chen
Pengxiang Ding
H. Zhao
Wei Zhao
Zhide Zhong
Zongyuan Ge
Jun Ma
Haoang Li
43
3
0
04 Mar 2025
Words or Vision: Do Vision-Language Models Have Blind Faith in Text?
Ailin Deng
Tri Cao
Zhirui Chen
Bryan Hooi
VLM
96
2
0
04 Mar 2025
Vision Language Models in Medicine
Beria Chingnabe Kalpelbe
Angel Gabriel Adaambiik
Wei Peng
VLM
LM&MA
86
2
0
24 Feb 2025
VLAS: Vision-Language-Action Model With Speech Instructions For Customized Robot Manipulation
VLAS: Vision-Language-Action Model With Speech Instructions For Customized Robot Manipulation
Wei Zhao
Pengxiang Ding
M. Zhang
Zhefei Gong
Shuanghao Bai
H. Zhao
Donglin Wang
85
5
0
24 Feb 2025
Tracking the Copyright of Large Vision-Language Models through Parameter Learning Adversarial Images
Tracking the Copyright of Large Vision-Language Models through Parameter Learning Adversarial Images
Yubo Wang
Jianting Tang
Chaohu Liu
Linli Xu
AAML
51
1
0
23 Feb 2025
From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education
From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education
Yi-Fan Zhang
Hang Li
D. Song
Lichao Sun
Tianlong Xu
Qingsong Wen
LLMAG
LRM
85
2
0
20 Feb 2025
DeepSeek on a Trip: Inducing Targeted Visual Hallucinations via Representation Vulnerabilities
DeepSeek on a Trip: Inducing Targeted Visual Hallucinations via Representation Vulnerabilities
Chashi Mahiul Islam
Samuel Jacob Chacko
Preston Horne
Xiuwen Liu
97
0
0
11 Feb 2025
1234567
Next