ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.14198
  4. Cited By
Flamingo: a Visual Language Model for Few-Shot Learning

Flamingo: a Visual Language Model for Few-Shot Learning

29 April 2022
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
Yana Hasson
Karel Lenc
A. Mensch
Katie Millican
Malcolm Reynolds
Roman Ring
Eliza Rutherford
Serkan Cabi
Tengda Han
Zhitao Gong
Sina Samangooei
Marianne Monteiro
Jacob Menick
Sebastian Borgeaud
Andy Brock
Aida Nematzadeh
Sahand Sharifzadeh
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
    MLLM
    VLM
ArXivPDFHTML

Papers citing "Flamingo: a Visual Language Model for Few-Shot Learning"

50 / 470 papers shown
Title
Instruct-ReID: A Multi-purpose Person Re-identification Task with Instructions
Instruct-ReID: A Multi-purpose Person Re-identification Task with Instructions
Weizhen He
Yihe Deng
Shixiang Tang
Qihao Chen
Qingsong Xie
...
Feng Zhu
Rui Zhao
Wanli Ouyang
Donglian Qi
Yunfeng Yan
65
18
0
13 Jun 2023
Valley: Video Assistant with Large Language model Enhanced abilitY
Valley: Video Assistant with Large Language model Enhanced abilitY
Ruipu Luo
Ziwang Zhao
Min Yang
Junwei Dong
Da Li
Pengcheng Lu
Tao Wang
Linmei Hu
Ming-Hui Qiu
MLLM
36
185
0
12 Jun 2023
Joint Adaptive Representations for Image-Language Learning
Joint Adaptive Representations for Image-Language Learning
A. Piergiovanni
A. Angelova
VLM
14
0
0
31 May 2023
Multi-modal Queried Object Detection in the Wild
Multi-modal Queried Object Detection in the Wild
Yifan Xu
Mengdan Zhang
Chaoyou Fu
Peixian Chen
Xiaoshan Yang
Ke Li
Changsheng Xu
ObjD
VLM
15
29
0
30 May 2023
Learning without Forgetting for Vision-Language Models
Learning without Forgetting for Vision-Language Models
Da-Wei Zhou
Yuanhan Zhang
Jingyi Ning
Jingyi Ning
De-Chuan Zhan
De-Chuan Zhan
Ziwei Liu
VLM
CLL
64
37
0
30 May 2023
PaLI-X: On Scaling up a Multilingual Vision and Language Model
PaLI-X: On Scaling up a Multilingual Vision and Language Model
Xi Chen
Josip Djolonga
Piotr Padlewski
Basil Mustafa
Soravit Changpinyo
...
Mojtaba Seyedhosseini
A. Angelova
Xiaohua Zhai
N. Houlsby
Radu Soricut
VLM
33
186
0
29 May 2023
Pre-training Contextualized World Models with In-the-wild Videos for
  Reinforcement Learning
Pre-training Contextualized World Models with In-the-wild Videos for Reinforcement Learning
Jialong Wu
Haoyu Ma
Chao Deng
Mingsheng Long
OffRL
19
24
0
29 May 2023
InstructEdit: Improving Automatic Masks for Diffusion-based Image
  Editing With User Instructions
InstructEdit: Improving Automatic Masks for Diffusion-based Image Editing With User Instructions
Qian Wang
Biao Zhang
Michael Birsak
Peter Wonka
DiffM
18
30
0
29 May 2023
Adapting Language-Audio Models as Few-Shot Audio Learners
Adapting Language-Audio Models as Few-Shot Audio Learners
Jinhua Liang
Xubo Liu
Haohe Liu
Huy P Phan
Emmanouil Benetos
Mark D. Plumbley
Wenwu Wang
VLM
17
19
0
28 May 2023
Plug-and-Play Document Modules for Pre-trained Models
Plug-and-Play Document Modules for Pre-trained Models
Chaojun Xiao
Zhengyan Zhang
Xu Han
Chi-Min Chan
Yankai Lin
Zhiyuan Liu
Xiangyang Li
Zhonghua Li
Zhao Cao
Maosong Sun
KELM
22
5
0
28 May 2023
IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
Haoxuan You
Rui Sun
Zhecan Wang
Long Chen
Gengyu Wang
Hammad A. Ayyubi
Kai-Wei Chang
Shih-Fu Chang
VLM
MLLM
LRM
37
42
0
24 May 2023
Improving Factuality and Reasoning in Language Models through Multiagent
  Debate
Improving Factuality and Reasoning in Language Models through Multiagent Debate
Yilun Du
Shuang Li
Antonio Torralba
J. Tenenbaum
Igor Mordatch
LLMAG
LRM
36
592
0
23 May 2023
Pengi: An Audio Language Model for Audio Tasks
Pengi: An Audio Language Model for Audio Tasks
Soham Deshmukh
Benjamin Elizalde
Rita Singh
Huaming Wang
MLLM
AuLLM
25
155
0
19 May 2023
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner
  and Dense Captioner
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner
Zikang Liu
Sihan Chen
Longteng Guo
Handong Li
Xingjian He
J. Liu
6
1
0
19 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited
  Modalities
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
13
113
0
18 May 2023
Semantically Aligned Task Decomposition in Multi-Agent Reinforcement
  Learning
Semantically Aligned Task Decomposition in Multi-Agent Reinforcement Learning
Wenhao Li
Dan Qiao
Baoxiang Wang
Xiangfeng Wang
Bo Jin
H. Zha
16
5
0
18 May 2023
OpenShape: Scaling Up 3D Shape Representation Towards Open-World
  Understanding
OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding
Minghua Liu
Ruoxi Shi
Kaiming Kuang
Yinhao Zhu
Xuanlin Li
Shizhong Han
H. Cai
Fatih Porikli
Hao Su
3DPC
22
115
0
18 May 2023
What You See is What You Read? Improving Text-Image Alignment Evaluation
What You See is What You Read? Improving Text-Image Alignment Evaluation
Michal Yarom
Yonatan Bitton
Soravit Changpinyo
Roee Aharoni
Jonathan Herzig
Oran Lang
E. Ofek
Idan Szpektor
EGVM
31
72
0
17 May 2023
Evaluating Object Hallucination in Large Vision-Language Models
Evaluating Object Hallucination in Large Vision-Language Models
Yifan Li
Yifan Du
Kun Zhou
Jinpeng Wang
Wayne Xin Zhao
Ji-Rong Wen
MLLM
LRM
25
688
0
17 May 2023
Self-Chained Image-Language Model for Video Localization and Question
  Answering
Self-Chained Image-Language Model for Video Localization and Question Answering
Shoubin Yu
Jaemin Cho
Prateek Yadav
Mohit Bansal
31
129
0
11 May 2023
COLA: A Benchmark for Compositional Text-to-image Retrieval
COLA: A Benchmark for Compositional Text-to-image Retrieval
Arijit Ray
Filip Radenovic
Abhimanyu Dubey
Bryan A. Plummer
Ranjay Krishna
Kate Saenko
CoGe
VLM
28
34
0
05 May 2023
VPGTrans: Transfer Visual Prompt Generator across LLMs
VPGTrans: Transfer Visual Prompt Generator across LLMs
Ao Zhang
Hao Fei
Yuan Yao
Wei Ji
Li Li
Zhiyuan Liu
Tat-Seng Chua
MLLM
VLM
15
85
0
02 May 2023
Lossless Adaptation of Pretrained Vision Models For Robotic Manipulation
Lossless Adaptation of Pretrained Vision Models For Robotic Manipulation
Mohit Sharma
Claudio Fantacci
Yuxiang Zhou
Skanda Koppula
N. Heess
Jonathan Scholz
Y. Aytar
VLM
33
28
0
13 Apr 2023
Efficient Multimodal Fusion via Interactive Prompting
Efficient Multimodal Fusion via Interactive Prompting
Yaowei Li
Ruijie Quan
Linchao Zhu
Yezhou Yang
20
42
0
13 Apr 2023
Improving Image Recognition by Retrieving from Web-Scale Image-Text Data
Improving Image Recognition by Retrieving from Web-Scale Image-Text Data
Ahmet Iscen
Alireza Fathi
Cordelia Schmid
VLM
3DV
13
25
0
11 Apr 2023
Uncurated Image-Text Datasets: Shedding Light on Demographic Bias
Uncurated Image-Text Datasets: Shedding Light on Demographic Bias
Noa Garcia
Yusuke Hirota
Yankun Wu
Yuta Nakashima
EGVM
20
50
0
06 Apr 2023
Evaluating Synthetic Pre-Training for Handwriting Processing Tasks
Evaluating Synthetic Pre-Training for Handwriting Processing Tasks
Vittorio Pippi
S. Cascianelli
Lorenzo Baraldi
Rita Cucchiara
17
5
0
04 Apr 2023
Locate Then Generate: Bridging Vision and Language with Bounding Box for
  Scene-Text VQA
Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA
Yongxin Zhu
Z. Liu
Yukang Liang
Xin Li
Hao Liu
Changcun Bao
Linli Xu
16
6
0
04 Apr 2023
The Vector Grounding Problem
The Vector Grounding Problem
Dimitri Coelho Mollo
Raphael Milliere
20
25
0
04 Apr 2023
RegionPLC: Regional Point-Language Contrastive Learning for Open-World
  3D Scene Understanding
RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding
Jihan Yang
Runyu Ding
Weipeng Deng
Zhe Wang
Xiaojuan Qi
10
61
0
03 Apr 2023
Procedure-Aware Pretraining for Instructional Video Understanding
Procedure-Aware Pretraining for Instructional Video Understanding
Honglu Zhou
Roberto Martín-Martín
Mubbasir Kapadia
Silvio Savarese
Juan Carlos Niebles
23
38
0
31 Mar 2023
Language Models can Solve Computer Tasks
Language Models can Solve Computer Tasks
Geunwoo Kim
Pierre Baldi
Stephen Marcus McAleer
LLMAG
LM&Ro
14
336
0
30 Mar 2023
MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks
MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks
Weicheng Kuo
A. Piergiovanni
Dahun Kim
Xiyang Luo
Benjamin Caine
...
Luowei Zhou
Andrew M. Dai
Zhifeng Chen
Claire Cui
A. Angelova
MLLM
VLM
12
23
0
29 Mar 2023
AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot
  AV-ASR
AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
Paul Hongsuck Seo
Arsha Nagrani
Cordelia Schmid
25
15
0
29 Mar 2023
Your Diffusion Model is Secretly a Zero-Shot Classifier
Your Diffusion Model is Secretly a Zero-Shot Classifier
Alexander C. Li
Mihir Prabhudesai
Shivam Duggal
Ellis L Brown
Deepak Pathak
DiffM
VLM
21
221
0
28 Mar 2023
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init
  Attention
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
Renrui Zhang
Jiaming Han
Chris Liu
Peng Gao
Aojun Zhou
Xiangfei Hu
Shilin Yan
Pan Lu
Hongsheng Li
Yu Qiao
MLLM
23
736
0
28 Mar 2023
Machine Learning for Brain Disorders: Transformers and Visual
  Transformers
Machine Learning for Brain Disorders: Transformers and Visual Transformers
Robin Courant
Maika Edberg
Nicolas Dufour
Vicky Kalogeiton
MedIm
ViT
17
1
0
21 Mar 2023
VideoXum: Cross-modal Visual and Textural Summarization of Videos
VideoXum: Cross-modal Visual and Textural Summarization of Videos
Jingyang Lin
Hang Hua
Ming Chen
Yikang Li
Jenhao Hsiao
C. Ho
Jiebo Luo
23
30
0
21 Mar 2023
Location-Free Scene Graph Generation
Location-Free Scene Graph Generation
Ege Ozsoy
Felix Holm
Tobias Czempiel
Tobias Czempiel
Benjamin Busam
Nassir Navab
Benjamin Busam
37
4
0
20 Mar 2023
ViperGPT: Visual Inference via Python Execution for Reasoning
ViperGPT: Visual Inference via Python Execution for Reasoning
Dídac Surís
Sachit Menon
Carl Vondrick
MLLM
LRM
ReLM
16
428
0
14 Mar 2023
A Comprehensive Survey of AI-Generated Content (AIGC): A History of
  Generative AI from GAN to ChatGPT
A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT
Yihan Cao
Siyu Li
Yixin Liu
Zhiling Yan
Yutong Dai
Philip S. Yu
Lichao Sun
19
493
0
07 Mar 2023
Real-World Humanoid Locomotion with Reinforcement Learning
Real-World Humanoid Locomotion with Reinforcement Learning
Ilija Radosavovic
Tete Xiao
Bike Zhang
Trevor Darrell
Jitendra Malik
K. Sreenath
15
117
0
06 Mar 2023
CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive
  Learning
CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning
Hritik Bansal
Nishad Singhi
Yu Yang
Fan Yin
Aditya Grover
Kai-Wei Chang
AAML
24
41
0
06 Mar 2023
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
Zhou Yu
Xuecheng Ouyang
Zhenwei Shao
Mei Wang
Jun Yu
MLLM
86
11
0
03 Mar 2023
Reward Design with Language Models
Reward Design with Language Models
Minae Kwon
Sang Michael Xie
Kalesha Bullard
Dorsa Sadigh
LM&Ro
8
197
0
27 Feb 2023
Knowledge-enhanced Visual-Language Pre-training on Chest Radiology
  Images
Knowledge-enhanced Visual-Language Pre-training on Chest Radiology Images
Xiaoman Zhang
Chaoyi Wu
Ya-Qin Zhang
Yanfeng Wang
Weidi Xie
MedIm
11
118
0
27 Feb 2023
Few-shot Multimodal Multitask Multilingual Learning
Few-shot Multimodal Multitask Multilingual Learning
Aman Chadha
Vinija Jain
31
0
0
19 Feb 2023
MINOTAUR: Multi-task Video Grounding From Multimodal Queries
MINOTAUR: Multi-task Video Grounding From Multimodal Queries
Raghav Goyal
E. Mavroudi
Xitong Yang
Sainbayar Sukhbaatar
Leonid Sigal
Matt Feiszli
Lorenzo Torresani
Du Tran
8
7
0
16 Feb 2023
The Edge of Orthogonality: A Simple View of What Makes BYOL Tick
The Edge of Orthogonality: A Simple View of What Makes BYOL Tick
Pierre Harvey Richemond
Allison C. Tam
Yunhao Tang
Florian Strub
Bilal Piot
Felix Hill
SSL
23
9
0
09 Feb 2023
Contrast with Reconstruct: Contrastive 3D Representation Learning Guided
  by Generative Pretraining
Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining
Zekun Qi
Runpei Dong
Guo Fan
Zheng Ge
Xiangyu Zhang
Kaisheng Ma
Li Yi
28
117
0
05 Feb 2023
Previous
123...10789
Next