ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.14198
  4. Cited By
Flamingo: a Visual Language Model for Few-Shot Learning

Flamingo: a Visual Language Model for Few-Shot Learning

29 April 2022
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
Yana Hasson
Karel Lenc
A. Mensch
Katie Millican
Malcolm Reynolds
Roman Ring
Eliza Rutherford
Serkan Cabi
Tengda Han
Zhitao Gong
Sina Samangooei
Marianne Monteiro
Jacob Menick
Sebastian Borgeaud
Andy Brock
Aida Nematzadeh
Sahand Sharifzadeh
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
    MLLM
    VLM
ArXivPDFHTML

Papers citing "Flamingo: a Visual Language Model for Few-Shot Learning"

50 / 474 papers shown
Title
Does Your Vision-Language Model Get Lost in the Long Video Sampling Dilemma?
Does Your Vision-Language Model Get Lost in the Long Video Sampling Dilemma?
Tianyuan Qu
Longxiang Tang
Bohao Peng
Senqiao Yang
Bei Yu
Jiaya Jia
VLM
77
0
0
16 Mar 2025
TikZero: Zero-Shot Text-Guided Graphics Program Synthesis
TikZero: Zero-Shot Text-Guided Graphics Program Synthesis
Jonas Belouadi
Eddy Ilg
M. Keuper
Hideki Tanaka
Masao Utiyama
Raj Dabre
Steffen Eger
Simone Paolo Ponzetto
50
0
0
14 Mar 2025
ProAPO: Progressively Automatic Prompt Optimization for Visual Classification
ProAPO: Progressively Automatic Prompt Optimization for Visual Classification
Xiangyan Qu
Gaopeng Gou
Jiamin Zhuang
Jing Yu
Kun Song
Qihao Wang
Yili Li
Gang Xiong
VLM
75
0
0
13 Mar 2025
FlowTok: Flowing Seamlessly Across Text and Image Tokens
FlowTok: Flowing Seamlessly Across Text and Image Tokens
Ju He
Qihang Yu
Qihao Liu
Liang-Chieh Chen
64
0
0
13 Mar 2025
HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model
HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model
Jiaming Liu
Hao Chen
Pengju An
Zhuoyang Liu
Renrui Zhang
...
Chengkai Hou
Mengdi Zhao
KC alex Zhou
Pheng-Ann Heng
S. Zhang
60
5
0
13 Mar 2025
Large-scale Pre-training for Grounded Video Caption Generation
Large-scale Pre-training for Grounded Video Caption Generation
Evangelos Kazakos
Cordelia Schmid
Josef Sivic
52
0
0
13 Mar 2025
BIMBA: Selective-Scan Compression for Long-Range Video Question Answering
Md. Mohaiminul Islam
Tushar Nagarajan
Huiyu Wang
Gedas Bertasius
Lorenzo Torresani
69
0
0
12 Mar 2025
Aligning Text to Image in Diffusion Models is Easier Than You Think
Aligning Text to Image in Diffusion Models is Easier Than You Think
J. Lee
Byunghee Cha
Jeongsol Kim
Jong Chul Ye
52
0
0
11 Mar 2025
Referring to Any Person
Referring to Any Person
Qing Jiang
Lin Wu
Zhaoyang Zeng
Tianhe Ren
Yuda Xiong
Yihao Chen
Qin Liu
Lei Zhang
77
0
0
11 Mar 2025
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding
Shehreen Azad
Vibhav Vineet
Y. S. Rawat
VLM
63
1
0
11 Mar 2025
MMRL: Multi-Modal Representation Learning for Vision-Language Models
MMRL: Multi-Modal Representation Learning for Vision-Language Models
Yuncheng Guo
Xiaodong Gu
VLM
OffRL
57
0
0
11 Mar 2025
Vision-based 3D Semantic Scene Completion via Capture Dynamic Representations
Vision-based 3D Semantic Scene Completion via Capture Dynamic Representations
Meng Wang
Fan Wu
Yunchuan Qin
Ruihui Li
Zhuo Tang
KenLi Li
3DPC
91
0
0
08 Mar 2025
Treble Counterfactual VLMs: A Causal Approach to Hallucination
Treble Counterfactual VLMs: A Causal Approach to Hallucination
Li Li
Jiashu Qu
Yuxiao Zhou
Yuehan Qin
Tiankai Yang
Yue Zhao
78
1
0
08 Mar 2025
Is Your Video Language Model a Reliable Judge?
M. Liu
Wensheng Zhang
56
1
0
07 Mar 2025
Generative Artificial Intelligence in Robotic Manipulation: A Survey
Kun Zhang
Peng Yun
Jun Cen
Junhao Cai
DiDi Zhu
...
Qifeng Chen
Jia Pan
Wei K. Zhang
Bo Yang
Hua Chen
59
1
0
05 Mar 2025
Enhancing Vision-Language Compositional Understanding with Multimodal Synthetic Data
Enhancing Vision-Language Compositional Understanding with Multimodal Synthetic Data
Haoxin Li
Boyang Li
CoGe
67
0
0
03 Mar 2025
Re-Imagining Multimodal Instruction Tuning: A Representation View
Re-Imagining Multimodal Instruction Tuning: A Representation View
Yiyang Liu
James Liang
Ruixiang Tang
Yugyung Lee
Majid Rabbani
...
Raghuveer M. Rao
Lifu Huang
Dongfang Liu
Qifan Wang
Cheng Han
51
0
0
02 Mar 2025
MIRROR: Multi-Modal Pathological Self-Supervised Representation Learning via Modality Alignment and Retention
MIRROR: Multi-Modal Pathological Self-Supervised Representation Learning via Modality Alignment and Retention
Tianyi Wang
Jianan Fan
Dingxin Zhang
Dongnan Liu
Yong-quan Xia
Heng Huang
Weidong Cai
34
0
0
01 Mar 2025
PaliGemma-CXR: A Multi-task Multimodal Model for TB Chest X-ray Interpretation
Denis Musinguzi
Andrew Katumba
Sudi Murindanyi
28
0
0
28 Feb 2025
InPK: Infusing Prior Knowledge into Prompt for Vision-Language Models
InPK: Infusing Prior Knowledge into Prompt for Vision-Language Models
Shuchang Zhou
Jiwei Wei
Shiyuan He
Yuyang Zhou
Chaoning Zhang
Jie Zou
Ning Xie
Yang Yang
VLM
VPVLM
81
0
0
27 Feb 2025
R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts
R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts
Zhongyang Li
Ziyue Li
Tianyi Zhou
MoE
44
0
0
27 Feb 2025
Repurposing the scientific literature with vision-language models
Repurposing the scientific literature with vision-language models
Anton Alyakin
Jaden Stryker
Daniel Alber
Karl L. Sangwon
Brandon Duderstadt
...
Laura Snyder
Eric Leuthardt
Douglas Kondziolka
E. Oermann
Eric Karl Oermann
92
0
0
26 Feb 2025
FilterRAG: Zero-Shot Informed Retrieval-Augmented Generation to Mitigate Hallucinations in VQA
FilterRAG: Zero-Shot Informed Retrieval-Augmented Generation to Mitigate Hallucinations in VQA
S M Sarwar
66
1
0
25 Feb 2025
Parameter Efficient Merging for Multimodal Large Language Models with Complementary Parameter Adaptation
Parameter Efficient Merging for Multimodal Large Language Models with Complementary Parameter Adaptation
Fanhu Zeng
Haiyang Guo
Fei Zhu
Li Shen
Hao Tang
MoMe
47
1
0
24 Feb 2025
ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval
ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval
Guanqi Zhan
Yuanpei Liu
Kai Han
Weidi Xie
Andrew Zisserman
VLM
69
0
0
21 Feb 2025
Pretrained Image-Text Models are Secretly Video Captioners
Pretrained Image-Text Models are Secretly Video Captioners
Chunhui Zhang
Yiren Jian
Z. Ouyang
Soroush Vosoughi
VLM
66
3
0
20 Feb 2025
Large Language Models for Anomaly and Out-of-Distribution Detection: A Survey
Large Language Models for Anomaly and Out-of-Distribution Detection: A Survey
Ruiyao Xu
Kaize Ding
50
5
0
17 Feb 2025
Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering
Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering
Zeqing Wang
Wentao Wan
Qiqing Lao
Runmeng Chen
Minjie Lang
Keze Wang
Liang Lin
Liang Lin
LRM
96
3
0
17 Feb 2025
TinyEmo: Scaling down Emotional Reasoning via Metric Projection
TinyEmo: Scaling down Emotional Reasoning via Metric Projection
Cristian Gutierrez
LRM
60
0
0
17 Feb 2025
A Real-to-Sim-to-Real Approach to Robotic Manipulation with VLM-Generated Iterative Keypoint Rewards
A Real-to-Sim-to-Real Approach to Robotic Manipulation with VLM-Generated Iterative Keypoint Rewards
Shivansh Patel
Xinchen Yin
Wenlong Huang
Shubham Garg
H. Nayyeri
Li Fei-Fei
Svetlana Lazebnik
Y. Li
89
0
0
12 Feb 2025
LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models
LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models
Tzu-Tao Chang
Shivaram Venkataraman
VLM
87
0
0
04 Feb 2025
mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition
mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition
Andrew Rouditchenko
Saurabhchand Bhati
Samuel Thomas
Hilde Kuehne
Rogerio Feris
93
1
0
03 Feb 2025
Position: AI Scaling: From Up to Down and Out
Position: AI Scaling: From Up to Down and Out
Yunke Wang
Yanxi Li
Chang Xu
HAI
71
1
0
02 Feb 2025
Technical report on label-informed logit redistribution for better domain generalization in low-shot classification with foundation models
Technical report on label-informed logit redistribution for better domain generalization in low-shot classification with foundation models
Behraj Khan
T. Syed
59
1
0
29 Jan 2025
MedPromptX: Grounded Multimodal Prompting for Chest X-ray Diagnosis
MedPromptX: Grounded Multimodal Prompting for Chest X-ray Diagnosis
Mai A. Shaaban
Adnan Khan
Mohammad Yaqub
LM&MA
78
2
0
28 Jan 2025
A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics
A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics
Kai He
Rui Mao
Qika Lin
Yucheng Ruan
Xiang Lan
Mengling Feng
Erik Cambria
LM&MA
AILaw
85
148
0
28 Jan 2025
Recognize Any Surgical Object: Unleashing the Power of Weakly-Supervised Data
Recognize Any Surgical Object: Unleashing the Power of Weakly-Supervised Data
Jiajie Li
Brian R Quaranto
Chenhui Xu
Ishan Mishra
Ruiyang Qin
Dancheng Liu
Peter C W Kim
Jinjun Xiong
83
0
0
25 Jan 2025
CBVLM: Training-free Explainable Concept-based Large Vision Language Models for Medical Image Classification
CBVLM: Training-free Explainable Concept-based Large Vision Language Models for Medical Image Classification
Cristiano Patrício
Isabel Rio-Torto
J. S. Cardoso
Luís F. Teixeira
João C. Neves
VLM
123
0
0
21 Jan 2025
Dynamic Scene Understanding from Vision-Language Representations
Dynamic Scene Understanding from Vision-Language Representations
Shahaf Pruss
Morris Alper
Hadar Averbuch-Elor
OCL
89
0
0
20 Jan 2025
VideoAuteur: Towards Long Narrative Video Generation
VideoAuteur: Towards Long Narrative Video Generation
Junfei Xiao
Feng Cheng
Lu Qi
Liangke Gui
Jiepeng Cen
Zhibei Ma
Alan L. Yuille
Lu Jiang
VGen
54
2
0
10 Jan 2025
MedCoDi-M: A Multi-Prompt Foundation Model for Multimodal Medical Data Generation
MedCoDi-M: A Multi-Prompt Foundation Model for Multimodal Medical Data Generation
Daniele Molino
Francesco Di Feola
E. Faiella
Deborah Fazzini
D. Santucci
Linlin Shen
V. Guarrasi
Paolo Soda
SyDa
MedIm
39
0
0
10 Jan 2025
OneLLM: One Framework to Align All Modalities with Language
OneLLM: One Framework to Align All Modalities with Language
Jiaming Han
Kaixiong Gong
Yiyuan Zhang
Jiaqi Wang
Kaipeng Zhang
D. Lin
Yu Qiao
Peng Gao
Xiangyu Yue
MLLM
104
102
0
10 Jan 2025
MObI: Multimodal Object Inpainting Using Diffusion Models
MObI: Multimodal Object Inpainting Using Diffusion Models
Alexandru Buburuzan
Anuj Sharma
John Redford
P. Dokania
Romain Mueller
DiffM
83
1
0
06 Jan 2025
Instruction-Guided Scene Text Recognition
Instruction-Guided Scene Text Recognition
Yongkun Du
Z. Chen
Yuchen Su
Caiyan Jia
Yu-Gang Jiang
68
3
0
03 Jan 2025
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
Jiannan Wu
Muyan Zhong
Sen Xing
Zeqiang Lai
Zhaoyang Liu
...
Lewei Lu
Tong Lu
Ping Luo
Yu Qiao
Jifeng Dai
MLLM
VLM
LRM
91
45
0
03 Jan 2025
Beyond Text: Implementing Multimodal Large Language Model-Powered Multi-Agent Systems Using a No-Code Platform
Beyond Text: Implementing Multimodal Large Language Model-Powered Multi-Agent Systems Using a No-Code Platform
Cheonsu Jeong
70
0
0
01 Jan 2025
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Wenqi Zhang
Hang Zhang
Xin Li
Jiashuo Sun
Yongliang Shen
Weiming Lu
Deli Zhao
Yueting Zhuang
Lidong Bing
VLM
37
2
0
01 Jan 2025
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
Pinelopi Papalampidi
Skanda Koppula
Shreya Pathak
Justin T Chiu
Joseph Heyward
Viorica Patraucean
Jiajun Shen
Antoine Miech
Andrew Zisserman
Aida Nematzdeh
VLM
56
23
0
31 Dec 2024
MegaSynth: Scaling Up 3D Scene Reconstruction with Synthesized Data
MegaSynth: Scaling Up 3D Scene Reconstruction with Synthesized Data
Hanwen Jiang
Zexiang Xu
Desai Xie
Z. Chen
Haian Jin
...
Xin Sun
Jiuxiang Gu
Qixing Huang
Georgios Pavlakos
Hao Tan
118
1
0
18 Dec 2024
GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training
GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training
Renqiu Xia
M. Li
Hancheng Ye
Wenjie Wu
Hongbin Zhou
...
Conghui He
Botian Shi
Tao Chen
Junchi Yan
Bo Zhang
82
7
0
16 Dec 2024
Previous
12345...8910
Next