ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.14198
  4. Cited By
Flamingo: a Visual Language Model for Few-Shot Learning

Flamingo: a Visual Language Model for Few-Shot Learning

29 April 2022
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
Yana Hasson
Karel Lenc
A. Mensch
Katie Millican
Malcolm Reynolds
Roman Ring
Eliza Rutherford
Serkan Cabi
Tengda Han
Zhitao Gong
Sina Samangooei
Marianne Monteiro
Jacob Menick
Sebastian Borgeaud
Andy Brock
Aida Nematzadeh
Sahand Sharifzadeh
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
    MLLM
    VLM
ArXivPDFHTML

Papers citing "Flamingo: a Visual Language Model for Few-Shot Learning"

50 / 470 papers shown
Title
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution
Zuyan Liu
Yuhao Dong
Ziwei Liu
Winston Hu
Jiwen Lu
Yongming Rao
ObjD
71
54
0
19 Sep 2024
End-to-end Open-vocabulary Video Visual Relationship Detection using Multi-modal Prompting
End-to-end Open-vocabulary Video Visual Relationship Detection using Multi-modal Prompting
Yongqi Wang
Xinxiao Wu
Shuo Yang
Jiebo Luo
44
0
0
19 Sep 2024
Generative Semantic Communication via Textual Prompts: Latency Performance Tradeoffs
Generative Semantic Communication via Textual Prompts: Latency Performance Tradeoffs
Mengmeng Ren
Li Qiao
Long Yang
Zhen Gao
Jian Chen
Mahdi Boloursaz Mashhadi
Pei Xiao
Rahim Tafazolli
Mehdi Bennis
VLM
89
4
0
15 Sep 2024
Relevance for Human Robot Collaboration
Relevance for Human Robot Collaboration
Xiaotong Zhang
Dingcheng Huang
Kamal Youcef-Toumi
33
2
0
12 Sep 2024
Securing Vision-Language Models with a Robust Encoder Against Jailbreak
  and Adversarial Attacks
Securing Vision-Language Models with a Robust Encoder Against Jailbreak and Adversarial Attacks
Md Zarif Hossain
Ahmed Imteaj
AAML
VLM
38
3
0
11 Sep 2024
What to align in multimodal contrastive learning?
What to align in multimodal contrastive learning?
Benoit Dufumier
J. Castillo-Navarro
D. Tuia
Jean-Philippe Thiran
22
3
0
11 Sep 2024
Revisiting Prompt Pretraining of Vision-Language Models
Revisiting Prompt Pretraining of Vision-Language Models
Zhenyuan Chen
Lingfeng Yang
Shuo Chen
Zhaowei Chen
Jiajun Liang
Xiang Li
MLLM
VPVLM
VLM
33
1
0
10 Sep 2024
Multi-Modal Adapter for Vision-Language Models
Multi-Modal Adapter for Vision-Language Models
Dominykas Seputis
Serghei Mihailov
Soham Chatterjee
Zehao Xiao
VLM
14
1
0
03 Sep 2024
SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring
  Expression Segmentation
SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation
Yi-Chia Chen
Wei-Hua Li
Cheng Sun
Yu-Chiang Frank Wang
Chu-Song Chen
VLM
30
10
0
01 Sep 2024
From Latent to Engine Manifolds: Analyzing ImageBind's Multimodal
  Embedding Space
From Latent to Engine Manifolds: Analyzing ImageBind's Multimodal Embedding Space
Andrew Hamara
Pablo Rivas
16
1
0
30 Aug 2024
Law of Vision Representation in MLLMs
Law of Vision Representation in MLLMs
Shijia Yang
Bohan Zhai
Quanzeng You
Jianbo Yuan
Hongxia Yang
Chenfeng Xu
40
9
0
29 Aug 2024
FungiTastic: A multi-modal dataset and benchmark for image categorization
FungiTastic: A multi-modal dataset and benchmark for image categorization
Lukás Picek
Klara Janouskova
Milan Šulc
Jirí Matas
72
1
0
24 Aug 2024
Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction
Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction
Anthony GX-Chen
Kenneth Marino
Rob Fergus
OCL
45
0
0
21 Aug 2024
EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model
EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model
Feipeng Ma
Yizhou Zhou
Hebei Li
Zilong He
Siying Wu
Fengyun Rao
Siying Wu
Fengyun Rao
Yueyi Zhang
Xiaoyan Sun
29
3
0
21 Aug 2024
Beyond the Hype: A dispassionate look at vision-language models in medical scenario
Beyond the Hype: A dispassionate look at vision-language models in medical scenario
Yang Nan
Huichi Zhou
Xiaodan Xing
Guang Yang
46
3
0
16 Aug 2024
Visual Agents as Fast and Slow Thinkers
Visual Agents as Fast and Slow Thinkers
Guangyan Sun
Mingyu Jin
Zhenting Wang
Cheng-Long Wang
Siqi Ma
Qifan Wang
Ying Nian Wu
Ying Nian Wu
Dongfang Liu
Dongfang Liu
LLMAG
LRM
74
11
0
16 Aug 2024
CROME: Cross-Modal Adapters for Efficient Multimodal LLM
CROME: Cross-Modal Adapters for Efficient Multimodal LLM
Sayna Ebrahimi
Sercan Ö. Arik
Tejas Nama
Tomas Pfister
37
1
0
13 Aug 2024
VL-TGS: Trajectory Generation and Selection using Vision Language Models in Mapless Outdoor Environments
VL-TGS: Trajectory Generation and Selection using Vision Language Models in Mapless Outdoor Environments
Daeun Song
Jing Liang
Xuesu Xiao
Dinesh Manocha
44
4
0
05 Aug 2024
Learning Video Context as Interleaved Multimodal Sequences
Learning Video Context as Interleaved Multimodal Sequences
S. Shao
Pengchuan Zhang
Y. Li
Xide Xia
A. Meso
Ziteng Gao
Jinheng Xie
N. Holliman
Mike Zheng Shou
41
5
0
31 Jul 2024
OmniBal: Towards Fast Instruct-tuning for Vision-Language Models via Omniverse Computation Balance
OmniBal: Towards Fast Instruct-tuning for Vision-Language Models via Omniverse Computation Balance
Yongqiang Yao
Jingru Tan
Jiahao Hu
Feizhao Zhang
Xin Jin
...
Ruihao Gong
Pengfei Liu
Pengfei Liu
Dahua Lin
Ningyi Xu
VLM
38
1
0
30 Jul 2024
Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos
Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos
Dhruv Verma
Debaditya Roy
Basura Fernando
21
1
0
30 Jul 2024
SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation
SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation
Pengfei Chen
Lingxi Xie
Xinyue Huo
Xuehui Yu
Xiaopeng Zhang
Yingfei Sun
Zhenjun Han
Qi Tian
VLM
53
1
0
23 Jul 2024
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language
  Models
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
Mingze Xu
Mingfei Gao
Zhe Gan
Hong-You Chen
Zhengfeng Lai
Haiming Gang
Kai Kang
Afshin Dehghan
48
48
0
22 Jul 2024
Learning Visual Grounding from Generative Vision and Language Model
Learning Visual Grounding from Generative Vision and Language Model
Shijie Wang
Dahun Kim
A. Taalimi
Chen Sun
Weicheng Kuo
ObjD
32
5
0
18 Jul 2024
ViLLa: Video Reasoning Segmentation with Large Language Model
ViLLa: Video Reasoning Segmentation with Large Language Model
Rongkun Zheng
Lu Qi
Xi Chen
Yi Wang
Kun Wang
Yu Qiao
Hengshuang Zhao
VOS
LRM
49
2
0
18 Jul 2024
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding
Ofir Abramovich
Niv Nayman
Sharon Fogel
I. Lavi
Ron Litman
Shahar Tsiper
Royee Tichauer
Srikar Appalaraju
Shai Mazor
R. Manmatha
VLM
25
3
0
17 Jul 2024
Controllable Contextualized Image Captioning: Directing the Visual
  Narrative through User-Defined Highlights
Controllable Contextualized Image Captioning: Directing the Visual Narrative through User-Defined Highlights
Shunqi Mao
Chaoyi Zhang
Hang Su
Hwanjun Song
Igor Shalyminov
Weidong Cai
26
1
0
16 Jul 2024
GPT Sonograpy: Hand Gesture Decoding from Forearm Ultrasound Images via
  VLM
GPT Sonograpy: Hand Gesture Decoding from Forearm Ultrasound Images via VLM
Keshav Bimbraw
Ye Wang
Jing Liu
T. Koike-Akino
VLM
MedIm
LM&MA
27
1
0
15 Jul 2024
Representation Learning and Identity Adversarial Training for Facial Behavior Understanding
Representation Learning and Identity Adversarial Training for Facial Behavior Understanding
Mang Ning
A. A. Salah
Itir Onal Ertugrul
CVBM
68
4
0
15 Jul 2024
VLMPC: Vision-Language Model Predictive Control for Robotic Manipulation
VLMPC: Vision-Language Model Predictive Control for Robotic Manipulation
Wentao Zhao
Jiaming Chen
Ziyu Meng
Donghui Mao
Ran Song
Wei Zhang
33
8
0
13 Jul 2024
GOFA: A Generative One-For-All Model for Joint Graph Language Modeling
GOFA: A Generative One-For-All Model for Joint Graph Language Modeling
Lecheng Kong
Jiarui Feng
Hao Liu
Chengsong Huang
Jiaxin Huang
Yixin Chen
Muhan Zhang
AI4CE
77
6
0
12 Jul 2024
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Yu-Guan Hsieh
Cheng-Yu Hsieh
Shih-Ying Yeh
Louis Béthune
Hadi Pour Ansari
Pavan Kumar Anasosalu Vasu
Chun-Liang Li
Ranjay Krishna
Oncel Tuzel
Marco Cuturi
58
4
0
09 Jul 2024
AWT: Transferring Vision-Language Models via Augmentation, Weighting,
  and Transportation
AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation
Yuhan Zhu
Yuyang Ji
Zhiyu Zhao
Gangshan Wu
Limin Wang
VLM
39
7
0
05 Jul 2024
Tarsier: Recipes for Training and Evaluating Large Video Description
  Models
Tarsier: Recipes for Training and Evaluating Large Video Description Models
Jiawei Wang
Liping Yuan
Yuchen Zhang
29
52
0
30 Jun 2024
GM-DF: Generalized Multi-Scenario Deepfake Detection
GM-DF: Generalized Multi-Scenario Deepfake Detection
Yingxin Lai
Zitong Yu
Jing Yang
Bin Li
Xiangui Kang
Linlin Shen
21
7
0
28 Jun 2024
Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model
Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model
Longrong Yang
Dong Shen
Chaoxiang Cai
Fan Yang
Size Li
Di Zhang
Xi Li
MoE
39
2
0
28 Jun 2024
ColPali: Efficient Document Retrieval with Vision Language Models
ColPali: Efficient Document Retrieval with Vision Language Models
Manuel Faysse
Hugues Sibille
Tony Wu
Bilel Omrani
Gautier Viaud
C´eline Hudelot
Pierre Colombo
VLM
54
21
0
27 Jun 2024
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation
Yuang Peng
Yuxin Cui
Haomiao Tang
Zekun Qi
Runpei Dong
Jing Bai
Chunrui Han
Zheng Ge
Xiangyu Zhang
Shu-Tao Xia
EGVM
69
30
0
24 Jun 2024
MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models
MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models
Shengkang Wang
Hongzhan Lin
Ziyang Luo
Zhen Ye
Guang Chen
Jing Ma
53
3
0
17 Jun 2024
CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation
CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation
Wei Chen
Lin Li
Yongqi Yang
Bin Wen
Fan Yang
Tingting Gao
Yu Wu
Long Chen
VLM
VGen
43
6
0
15 Jun 2024
First Multi-Dimensional Evaluation of Flowchart Comprehension for
  Multimodal Large Language Models
First Multi-Dimensional Evaluation of Flowchart Comprehension for Multimodal Large Language Models
Enming Zhang
Ruobing Yao
Huanyong Liu
Junhui Yu
Jiale Wang
ELM
LRM
37
0
0
14 Jun 2024
What If We Recaption Billions of Web Images with LLaMA-3?
What If We Recaption Billions of Web Images with LLaMA-3?
Xianhang Li
Haoqin Tu
Mude Hui
Zeyu Wang
Bingchen Zhao
...
Jieru Mei
Qing Liu
Huangjie Zheng
Yuyin Zhou
Cihang Xie
VLM
MLLM
28
34
0
12 Jun 2024
LVBench: An Extreme Long Video Understanding Benchmark
LVBench: An Extreme Long Video Understanding Benchmark
Weihan Wang
Zehai He
Wenyi Hong
Yean Cheng
Xiaohan Zhang
...
Shiyu Huang
Bin Xu
Yuxiao Dong
Ming Ding
Jie Tang
ELM
VLM
38
63
0
12 Jun 2024
Can Large Language Models Understand Spatial Audio?
Can Large Language Models Understand Spatial Audio?
Changli Tang
Wenyi Yu
Guangzhi Sun
Xianzhao Chen
Tian Tan
...
Jun Zhang
Lu Lu
Zejun Ma
Yuxuan Wang
Chao Zhang
44
4
0
12 Jun 2024
An Image is Worth 32 Tokens for Reconstruction and Generation
An Image is Worth 32 Tokens for Reconstruction and Generation
Qihang Yu
Mark Weber
XueQing Deng
Xiaohui Shen
Daniel Cremers
Liang-Chieh Chen
VLM
ViT
39
77
0
11 Jun 2024
MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance
MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance
X. Wang
Siming Fu
Qihan Huang
Wanggui He
Hao Jiang
DiffM
34
40
0
11 Jun 2024
Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees
Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees
Sijia Chen
Yibo Wang
Yi-Feng Wu
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
Lijun Zhang
LLMAG
LRM
48
10
0
11 Jun 2024
F-LMM: Grounding Frozen Large Multimodal Models
F-LMM: Grounding Frozen Large Multimodal Models
Size Wu
Sheng Jin
Wenwei Zhang
Lumin Xu
Wentao Liu
Wei Li
Chen Change Loy
MLLM
70
12
0
09 Jun 2024
Towards Semantic Equivalence of Tokenization in Multimodal LLM
Towards Semantic Equivalence of Tokenization in Multimodal LLM
Shengqiong Wu
Hao Fei
Xiangtai Li
Jiayi Ji
Hanwang Zhang
Tat-Seng Chua
Shuicheng Yan
MLLM
59
25
0
07 Jun 2024
3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination
3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination
Jianing Yang
Xuweiyi Chen
Nikhil Madaan
Madhavan Iyengar
Shengyi Qian
David Fouhey
Joyce Chai
3DV
65
11
0
07 Jun 2024
Previous
12345...8910
Next