ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.14198
  4. Cited By
Flamingo: a Visual Language Model for Few-Shot Learning

Flamingo: a Visual Language Model for Few-Shot Learning

29 April 2022
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
Yana Hasson
Karel Lenc
A. Mensch
Katie Millican
Malcolm Reynolds
Roman Ring
Eliza Rutherford
Serkan Cabi
Tengda Han
Zhitao Gong
Sina Samangooei
Marianne Monteiro
Jacob Menick
Sebastian Borgeaud
Andy Brock
Aida Nematzadeh
Sahand Sharifzadeh
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
    MLLM
    VLM
ArXivPDFHTML

Papers citing "Flamingo: a Visual Language Model for Few-Shot Learning"

50 / 470 papers shown
Title
Describe, Explain, Plan and Select: Interactive Planning with Large
  Language Models Enables Open-World Multi-Task Agents
Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents
Zihao Wang
Shaofei Cai
Guanzhou Chen
Anji Liu
Xiaojian Ma
Yitao Liang
LM&Ro
LLMAG
35
313
0
03 Feb 2023
IC3: Image Captioning by Committee Consensus
IC3: Image Captioning by Committee Consensus
David M. Chan
Austin Myers
Sudheendra Vijayanarasimhan
David A. Ross
John F. Canny
19
17
0
02 Feb 2023
Vision Learners Meet Web Image-Text Pairs
Vision Learners Meet Web Image-Text Pairs
Bingchen Zhao
Quan Cui
Hao Wu
Osamu Yoshie
Cheng Yang
Oisin Mac Aodha
VLM
11
5
0
17 Jan 2023
See, Think, Confirm: Interactive Prompting Between Vision and Language
  Models for Knowledge-based Visual Reasoning
See, Think, Confirm: Interactive Prompting Between Vision and Language Models for Knowledge-based Visual Reasoning
Zhenfang Chen
Qinhong Zhou
Yikang Shen
Yining Hong
Hao Zhang
Chuang Gan
LRM
VLM
29
35
0
12 Jan 2023
Scene-centric vs. Object-centric Image-Text Cross-modal Retrieval: A
  Reproducibility Study
Scene-centric vs. Object-centric Image-Text Cross-modal Retrieval: A Reproducibility Study
Mariya Hendriksen
Svitlana Vakulenko
E. Kuiper
Maarten de Rijke
16
5
0
12 Jan 2023
All in Tokens: Unifying Output Space of Visual Tasks via Soft Token
All in Tokens: Unifying Output Space of Visual Tasks via Soft Token
Jia Ning
Chen Li
Zheng-Wei Zhang
Zigang Geng
Qi Dai
Kun He
Han Hu
25
42
0
05 Jan 2023
Reference Twice: A Simple and Unified Baseline for Few-Shot Instance
  Segmentation
Reference Twice: A Simple and Unified Baseline for Few-Shot Instance Segmentation
Yue Han
Jiangning Zhang
Zhucun Xue
Chao Xu
Xintian Shen
Yabiao Wang
Chengjie Wang
Yong Liu
Xiangtai Li
27
16
0
03 Jan 2023
Task Ambiguity in Humans and Language Models
Task Ambiguity in Humans and Language Models
Alex Tamkin
Kunal Handa
Ava Shrestha
Noah D. Goodman
UQLM
18
22
0
20 Dec 2022
Position-guided Text Prompt for Vision-Language Pre-training
Position-guided Text Prompt for Vision-Language Pre-training
Alex Jinpeng Wang
Pan Zhou
Mike Zheng Shou
Shuicheng Yan
VLM
11
37
0
19 Dec 2022
A Survey on Natural Language Processing for Programming
A Survey on Natural Language Processing for Programming
Qingfu Zhu
Xianzhen Luo
Fang Liu
Cuiyun Gao
Wanxiang Che
13
1
0
12 Dec 2022
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist
  Models
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models
Jinze Bai
Rui Men
Han Yang
Xuancheng Ren
Kai Dang
...
Wenhang Ge
Jianxin Ma
Junyang Lin
Jingren Zhou
Chang Zhou
31
15
0
08 Dec 2022
Editing Models with Task Arithmetic
Editing Models with Task Arithmetic
Gabriel Ilharco
Marco Tulio Ribeiro
Mitchell Wortsman
Suchin Gururangan
Ludwig Schmidt
Hannaneh Hajishirzi
Ali Farhadi
KELM
MoMe
MU
22
421
0
08 Dec 2022
UniGeo: Unifying Geometry Logical Reasoning via Reformulating
  Mathematical Expression
UniGeo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression
Jiaqi Chen
Tong Li
Jinghui Qin
Pan Lu
Liang Lin
Chongyu Chen
Xiaodan Liang
AIMat
LRM
30
89
0
06 Dec 2022
Images Speak in Images: A Generalist Painter for In-Context Visual
  Learning
Images Speak in Images: A Generalist Painter for In-Context Visual Learning
Xinlong Wang
Wen Wang
Yue Cao
Chunhua Shen
Tiejun Huang
VLM
MLLM
30
244
0
05 Dec 2022
I2MVFormer: Large Language Model Generated Multi-View Document
  Supervision for Zero-Shot Image Classification
I2MVFormer: Large Language Model Generated Multi-View Document Supervision for Zero-Shot Image Classification
Muhammad Ferjad Naeem
Muhammad Gul Zain Ali Khan
Yongqin Xian
Muhammad Zeshan Afzal
D. Stricker
Luc Van Gool
F. Tombari
VLM
22
51
0
05 Dec 2022
CL-CrossVQA: A Continual Learning Benchmark for Cross-Domain Visual
  Question Answering
CL-CrossVQA: A Continual Learning Benchmark for Cross-Domain Visual Question Answering
Yao Zhang
Haokun Chen
A. Frikha
Yezi Yang
Denis Krompass
Gengyuan Zhang
Jindong Gu
Volker Tresp
VLM
LRM
8
7
0
19 Nov 2022
Visual Programming: Compositional visual reasoning without training
Visual Programming: Compositional visual reasoning without training
Tanmay Gupta
Aniruddha Kembhavi
ReLM
VLM
LRM
22
397
0
18 Nov 2022
PromptCap: Prompt-Guided Task-Aware Image Captioning
PromptCap: Prompt-Guided Task-Aware Image Captioning
Yushi Hu
Hang Hua
Zhengyuan Yang
Weijia Shi
Noah A. Smith
Jiebo Luo
28
101
0
15 Nov 2022
Metaphors We Learn By
Metaphors We Learn By
Roland Memisevic
11
0
0
11 Nov 2022
Towards Reasoning-Aware Explainable VQA
Towards Reasoning-Aware Explainable VQA
Rakesh Vaideeswaran
Feng Gao
Abhinav Mathur
Govind Thattai
LRM
22
3
0
09 Nov 2022
Okapi: Generalising Better by Making Statistical Matches Match
Okapi: Generalising Better by Making Statistical Matches Match
Myles Bartlett
Sara Romiti
V. Sharmanska
Novi Quadrianto
24
3
0
07 Nov 2022
A General Purpose Neural Architecture for Geospatial Systems
A General Purpose Neural Architecture for Geospatial Systems
Nasim Rahaman
Martin Weiss
Frederik Trauble
Francesco Locatello
Alexandre Lacoste
Yoshua Bengio
C. Pal
Li Erran Li
Bernhard Schölkopf
AI4TS
AI4CE
19
5
0
04 Nov 2022
Training Vision-Language Models with Less Bimodal Supervision
Training Vision-Language Models with Less Bimodal Supervision
Elad Segal
Ben Bogin
Jonathan Berant
VLM
19
2
0
01 Nov 2022
Composing Ensembles of Pre-trained Models via Iterative Consensus
Composing Ensembles of Pre-trained Models via Iterative Consensus
Shuang Li
Yilun Du
J. Tenenbaum
Antonio Torralba
Igor Mordatch
MoMe
19
23
0
20 Oct 2022
AMP: Automatically Finding Model Parallel Strategies with Heterogeneity
  Awareness
AMP: Automatically Finding Model Parallel Strategies with Heterogeneity Awareness
Dacheng Li
Hongyi Wang
Eric P. Xing
Haotong Zhang
MoE
14
17
0
13 Oct 2022
VIMA: General Robot Manipulation with Multimodal Prompts
VIMA: General Robot Manipulation with Multimodal Prompts
Yunfan Jiang
Agrim Gupta
Zichen Zhang
Guanzhi Wang
Yongqiang Dou
Yanjun Chen
Li Fei-Fei
Anima Anandkumar
Yuke Zhu
Linxi Fan
LM&Ro
15
332
0
06 Oct 2022
Scaling Laws for a Multi-Agent Reinforcement Learning Model
Scaling Laws for a Multi-Agent Reinforcement Learning Model
Oren Neumann
C. Gros
21
26
0
29 Sep 2022
Learning More May Not Be Better: Knowledge Transferability in Vision and
  Language Tasks
Learning More May Not Be Better: Knowledge Transferability in Vision and Language Tasks
Tianwei Chen
Noa Garcia
Mayu Otani
Chenhui Chu
Yuta Nakashima
Hajime Nagahara
VLM
19
0
0
23 Aug 2022
Discovering Bugs in Vision Models using Off-the-shelf Image Generation
  and Captioning
Discovering Bugs in Vision Models using Off-the-shelf Image Generation and Captioning
Olivia Wiles
Isabela Albuquerque
Sven Gowal
VLM
27
44
0
18 Aug 2022
Quality Not Quantity: On the Interaction between Dataset Design and
  Robustness of CLIP
Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP
Thao Nguyen
Gabriel Ilharco
Mitchell Wortsman
Sewoong Oh
Ludwig Schmidt
CLIP
VLM
25
97
0
10 Aug 2022
Learning to Generalize with Object-centric Agents in the Open World
  Survival Game Crafter
Learning to Generalize with Object-centric Agents in the Open World Survival Game Crafter
Aleksandar Stanić
Yujin Tang
David R Ha
Jürgen Schmidhuber
ELM
13
11
0
05 Aug 2022
Masked Vision and Language Modeling for Multi-modal Representation
  Learning
Masked Vision and Language Modeling for Multi-modal Representation Learning
Gukyeong Kwon
Zhaowei Cai
Avinash Ravichandran
Erhan Bas
Rahul Bhotika
Stefano Soatto
19
66
0
03 Aug 2022
Semantic Abstraction: Open-World 3D Scene Understanding from 2D
  Vision-Language Models
Semantic Abstraction: Open-World 3D Scene Understanding from 2D Vision-Language Models
Huy Ha
Shuran Song
LM&Ro
VLM
25
101
0
23 Jul 2022
Do Artificial Intelligence Systems Understand?
Do Artificial Intelligence Systems Understand?
Eduardo C. Garrido-Merchán
Carlos Blanco
6
6
0
22 Jul 2022
Machine Learning Model Sizes and the Parameter Gap
Machine Learning Model Sizes and the Parameter Gap
Pablo Villalobos
J. Sevilla
T. Besiroglu
Lennart Heim
A. Ho
Marius Hobbhahn
ALM
ELM
AI4CE
10
55
0
05 Jul 2022
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Jiasen Lu
Christopher Clark
Rowan Zellers
Roozbeh Mottaghi
Aniruddha Kembhavi
ObjD
VLM
MLLM
31
391
0
17 Jun 2022
Zero-Shot Video Question Answering via Frozen Bidirectional Language
  Models
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
19
225
0
16 Jun 2022
ProcTHOR: Large-Scale Embodied AI Using Procedural Generation
ProcTHOR: Large-Scale Embodied AI Using Procedural Generation
Matt Deitke
Eli VanderBilt
Alvaro Herrasti
Luca Weihs
Jordi Salvador
...
Winson Han
Eric Kolve
Ali Farhadi
Aniruddha Kembhavi
Roozbeh Mottaghi
LM&Ro
19
232
0
14 Jun 2022
Multimodal Learning with Transformers: A Survey
Multimodal Learning with Transformers: A Survey
P. Xu
Xiatian Zhu
David A. Clifton
ViT
41
518
0
13 Jun 2022
Multi-Agent Reinforcement Learning is a Sequence Modeling Problem
Multi-Agent Reinforcement Learning is a Sequence Modeling Problem
Muning Wen
J. Kuba
Runji Lin
Weinan Zhang
Ying Wen
J. Wang
Yaodong Yang
10
176
0
30 May 2022
A Generalist Agent
A Generalist Agent
Scott E. Reed
Konrad Zolna
Emilio Parisotto
Sergio Gomez Colmenarejo
Alexander Novikov
...
Yutian Chen
R. Hadsell
Oriol Vinyals
Mahyar Bordbar
Nando de Freitas
LM&Ro
LLMAG
AI4CE
22
782
0
12 May 2022
i-Code: An Integrative and Composable Multimodal Learning Framework
i-Code: An Integrative and Composable Multimodal Learning Framework
Ziyi Yang
Yuwei Fang
Chenguang Zhu
Reid Pryzant
Dongdong Chen
...
Bin Xiao
Yuanxun Lu
Takuya Yoshioka
Michael Zeng
Xuedong Huang
35
45
0
03 May 2022
Visual Spatial Reasoning
Visual Spatial Reasoning
Fangyu Liu
Guy Edward Toh Emerson
Nigel Collier
ReLM
21
155
0
30 Apr 2022
Attention Mechanism based Cognition-level Scene Understanding
Attention Mechanism based Cognition-level Scene Understanding
Xuejiao Tang
Tai Le Quy
LRM
23
0
0
17 Apr 2022
Semantic Exploration from Language Abstractions and Pretrained
  Representations
Semantic Exploration from Language Abstractions and Pretrained Representations
Allison C. Tam
Neil C. Rabinowitz
Andrew Kyle Lampinen
Nicholas A. Roy
Stephanie C. Y. Chan
D. Strouse
Jane X. Wang
Andrea Banino
Felix Hill
LM&Ro
11
67
0
08 Apr 2022
Single-Stream Multi-Level Alignment for Vision-Language Pretraining
Single-Stream Multi-Level Alignment for Vision-Language Pretraining
Zaid Khan
B. Vijaykumar
Xiang Yu
S. Schulter
Manmohan Chandraker
Y. Fu
CLIP
VLM
20
16
0
27 Mar 2022
Teaching language models to support answers with verified quotes
Teaching language models to support answers with verified quotes
Jacob Menick
Maja Trebacz
Vladimir Mikulik
John Aslanides
Francis Song
...
Mia Glaese
Susannah Young
Lucy Campbell-Gillingham
G. Irving
Nat McAleese
ELM
RALM
226
255
0
21 Mar 2022
Survey of Hallucination in Natural Language Generation
Survey of Hallucination in Natural Language Generation
Ziwei Ji
Nayeon Lee
Rita Frieske
Tiezheng Yu
D. Su
...
Delong Chen
Wenliang Dai
Ho Shu Chan
Andrea Madotto
Pascale Fung
HILM
LRM
20
2,212
0
08 Feb 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
382
4,010
0
28 Jan 2022
Multitask Prompted Training Enables Zero-Shot Task Generalization
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
203
1,651
0
15 Oct 2021
Previous
123...1089
Next