Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2102.10407
Cited By
v1
v2
v3
v4
v5 (latest)
VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning
Computer Vision and Pattern Recognition (CVPR), 2021
20 February 2021
Jun Chen
Han Guo
Kai Yi
Boyang Albert Li
Mohamed Elhoseiny
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (331★)
Papers citing
"VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning"
50 / 165 papers shown
The Solution for the CVPR2023 NICE Image Captioning Challenge
Xiangyu Wu
Yi Gao
Hailiang Zhang
Yang Yang
Weili Guo
Jianfeng Lu
290
1
0
10 Oct 2023
MAGMA: Music Aligned Generative Motion Autodecoder
Sohan Anisetty
Amit Raj
James Hays
151
0
0
03 Sep 2023
RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large Model
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Fengxiang Bie
Jianlong Wu
Zhongzhu Zhou
Adam Ghanem
Minjia Zhang
...
Pareesa Ameneh Golnari
David A. Clifton
Yuxiong He
Dacheng Tao
Shuaiwen Leon Song
EGVM
254
56
0
02 Sep 2023
Overcoming Generic Knowledge Loss with Selective Parameter Update
Computer Vision and Pattern Recognition (CVPR), 2023
Wenxuan Zhang
Paul Janson
Rahaf Aljundi
Mohamed Elhoseiny
KELM
CLL
377
20
0
23 Aug 2023
Learning to Model the World with Language
International Conference on Machine Learning (ICML), 2023
Jessy Lin
Yuqing Du
Olivia Watkins
Danijar Hafner
Pieter Abbeel
Dan Klein
Anca Dragan
LM&Ro
SyDa
289
71
0
31 Jul 2023
Text-guided Foundation Model Adaptation for Pathological Image Classification
International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2023
Yunkun Zhang
Jinglei Gao
Mu Zhou
Xiaosong Wang
Yu Qiao
Shaoting Zhang
Yi Xu
MedIm
162
61
0
27 Jul 2023
Exploring Annotation-free Image Captioning with Retrieval-augmented Pseudo Sentence Generation
ACM Multimedia Asia (MA), 2023
Zhiyuan Li
Dongnan Liu
Heng Wang
Chaoyi Zhang
Weidong (Tom) Cai
RALM
188
1
0
27 Jul 2023
Is attention all you need in medical image analysis? A review
IEEE journal of biomedical and health informatics (IEEE JBHI), 2023
G. Papanastasiou
Nikolaos Dikaios
Jiahao Huang
Chengjia Wang
Guang Yang
ViT
MedIm
223
49
0
24 Jul 2023
Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition
IEEE Transactions on Affective Computing (IEEE Trans. Affective Comput.), 2023
Weidong Chen
Xiaofen Xing
Peihao Chen
Xiangmin Xu
VLM
300
65
0
20 Jul 2023
ChatGPT in the Age of Generative AI and Large Language Models: A Concise Survey
S. Mohamadi
Ghulam Mujtaba
Ngan Le
Gianfranco Doretto
Don Adjeroh
LM&MA
AI4MH
303
37
0
09 Jul 2023
Image Background Serves as Good Proxy for Out-of-distribution Data
International Conference on Learning Representations (ICLR), 2023
Sen Pei
271
3
0
02 Jul 2023
Integrating Large Pre-trained Models into Multimodal Named Entity Recognition with Evidential Fusion
Weide Liu
Xiaoyang Zhong
Jingwen Hou
Shaohua Li
Haozhe Huang
Yuming Fang
EDL
179
5
0
29 Jun 2023
Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models
Junting Pan
Ziyi Lin
Yuying Ge
Xiatian Zhu
Renrui Zhang
Yi Wang
Yu Qiao
Jiaming Song
MLLM
177
35
0
15 Jun 2023
Artificial General Intelligence for Medical Imaging
IEEE Reviews in Biomedical Engineering (RBME), 2023
Xiang Li
Lu Zhang
Zihao Wu
Zheng Liu
Lin Zhao
...
Pingkuan Yan
Shijie Zhao
Wen Liu
Tianming Liu
Hongtu Zhu
LM&MA
AI4CE
293
57
0
08 Jun 2023
Security Knowledge-Guided Fuzzing of Deep Learning Libraries
Nima Shiri Harzevili
Mohammad Mahdi Mohajer
Moshi Wei
H. Pham
Song Wang
AAML
AI4CE
198
1
0
05 Jun 2023
A survey of Generative AI Applications
Journal of Computer Science (JCS), 2023
Roberto Gozalo-Brizuela
Eduardo C. Garrido-Merchán
3DV
MedIm
378
135
0
05 Jun 2023
Exploring Open-Vocabulary Semantic Segmentation without Human Labels
Jun Chen
Deyao Zhu
Guocheng Qian
Guohao Li
Zhicheng Yan
Chenchen Zhu
Fanyi Xiao
Mohamed Elhoseiny
Sean Culatana
VLM
220
12
0
01 Jun 2023
GPT4Image: Large Pre-trained Models Help Vision Models Learn Better on Perception Task
The Web Conference (WWW), 2023
Ning Ding
Yehui Tang
Zhongqian Fu
Chaoting Xu
Kai Han
Yunhe Wang
MLLM
VLM
125
2
0
01 Jun 2023
Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey
ACM Computing Surveys (ACM Comput. Surv.), 2023
Chen Ling
Xujiang Zhao
Jiaying Lu
Chengyuan Deng
Can Zheng
...
Chris White
Quanquan Gu
Jian Pei
Carl Yang
Bo Pan
ALM
410
214
0
30 May 2023
Contextual Object Detection with Multimodal Large Language Models
International Journal of Computer Vision (IJCV), 2023
Yuhang Zang
Wei Li
Jun Han
Kaiyang Zhou
Chen Change Loy
ObjD
VLM
MLLM
325
141
0
29 May 2023
On Evaluating Adversarial Robustness of Large Vision-Language Models
Neural Information Processing Systems (NeurIPS), 2023
Yunqing Zhao
Tianyu Pang
Chao Du
Xiao Yang
Chongxuan Li
Ngai-Man Cheung
Min Lin
VLM
AAML
MLLM
480
264
0
26 May 2023
EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought
Neural Information Processing Systems (NeurIPS), 2023
Yao Mu
Qinglong Zhang
Mengkang Hu
Wen Wang
Mingyu Ding
Jun Jin
Sijin Yu
Jifeng Dai
Yu Qiao
Ping Luo
LM&Ro
LRM
389
348
0
24 May 2023
VideoLLM: Modeling Video Sequence with Large Language Models
Guo Chen
Yin-Dong Zheng
Jiahao Wang
Jilan Xu
Yifei Huang
...
Yi Wang
Yali Wang
Yu Qiao
Tong Lu
Limin Wang
MLLM
261
113
0
22 May 2023
X-IQE: eXplainable Image Quality Evaluation for Text-to-Image Generation with Visual Large Language Models
Yixiong Chen
Li Liu
C. Ding
174
29
0
18 May 2023
MedBLIP: Bootstrapping Language-Image Pre-training from 3D Medical Images and Texts
Asian Conference on Computer Vision (ACCV), 2023
Qiuhui Chen
Xinyue Hu
Zirui Wang
Yi Hong
LM&MA
MedIm
174
67
0
18 May 2023
ArtGPT-4: Towards Artistic-understanding Large Vision-Language Models with Enhanced Adapter
Zheng Yuan
HU Xue
Kun Wang
Yongming Liu
Kun Wang
VLM
MLLM
378
12
0
12 May 2023
Automatic Radiology Report Generation by Learning with Increasingly Hard Negatives
European Conference on Artificial Intelligence (ECAI), 2023
Bhanu Prakash Voutharoja
Lei Wang
Luping Zhou
MedIm
147
13
0
11 May 2023
Image-to-Text Translation for Interactive Image Recognition: A Comparative User Study with Non-Expert Users
Journal of Information Processing (JIP), 2023
Wataru Kawabe
Yusuke Sugano
VLM
153
2
0
11 May 2023
Vision-Language Models in Remote Sensing: Current Progress and Future Trends
IEEE Geoscience and Remote Sensing Magazine (GRSM), 2023
Xiang Li
Congcong Wen
Yuan Hu
Zhenghang Yuan
Xiao Xiang Zhu
VLM
352
159
0
09 May 2023
Making the Most of What You Have: Adapting Pre-trained Visual Language Models in the Low-data Regime
Chuhan Zhang
Antoine Miech
Jiajun Shen
Jean-Baptiste Alayrac
Pauline Luc
VLM
VPVLM
228
2
0
03 May 2023
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
International Conference on Learning Representations (ICLR), 2023
Deyao Zhu
Jun Chen
Xiaoqian Shen
Xiang Li
Mohamed Elhoseiny
VLM
MLLM
467
2,724
0
20 Apr 2023
Verbs in Action: Improving verb understanding in video-language models
IEEE International Conference on Computer Vision (ICCV), 2023
Liliane Momeni
Mathilde Caron
Arsha Nagrani
Andrew Zisserman
Cordelia Schmid
373
87
0
13 Apr 2023
Advancing Medical Imaging with Language Models: A Journey from N-grams to ChatGPT
Mingzhe Hu
Shaoyan Pan
Yuheng Li
Xiaofeng Yang
LM&MA
233
49
0
11 Apr 2023
Video ChatCaptioner: Towards Enriched Spatiotemporal Descriptions
Jun Chen
Deyao Zhu
Kilichbek Haydarov
Xiang Li
Mohamed Elhoseiny
264
44
0
09 Apr 2023
When Brain-inspired AI Meets AGI
Lin Zhao
Lu Zhang
Zihao Wu
Yuzhong Chen
Haixing Dai
...
Xi Jiang
Xiang Li
Dajiang Zhu
Hongtu Zhu
Tianming Liu
AI4CE
168
115
0
28 Mar 2023
eP-ALM: Efficient Perceptual Augmentation of Language Models
IEEE International Conference on Computer Vision (ICCV), 2023
Mustafa Shukor
Corentin Dancette
Matthieu Cord
MLLM
VLM
417
34
0
20 Mar 2023
Decomposed Prototype Learning for Few-Shot Scene Graph Generation
Xingchen Li
Long Chen
Guikun Chen
Yinfu Feng
Yi Yang
Jun Xiao
176
7
0
20 Mar 2023
Cross-Modal Causal Intervention for Medical Report Generation
IEEE Transactions on Image Processing (IEEE TIP), 2023
Weixing Chen
Yang-Yang Liu
Ce Wang
Jiarui Zhu
Shen Zhao
Guanbin Li
Cheng-Lin Liu
329
7
0
16 Mar 2023
ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual Descriptions
Deyao Zhu
Jun Chen
Kilichbek Haydarov
Xiaoqian Shen
Wenxuan Zhang
Mohamed Elhoseiny
MLLM
236
125
0
12 Mar 2023
Learning Combinatorial Prompts for Universal Controllable Image Captioning
International Journal of Computer Vision (IJCV), 2023
Zhen Wang
Jun Xiao
Yueting Zhuang
Fei Gao
Jian Shao
Long Chen
200
12
0
11 Mar 2023
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models
Chenfei Wu
Sheng-Kai Yin
Weizhen Qi
Xiaodong Wang
Zecheng Tang
Nan Duan
MLLM
LRM
358
765
0
08 Mar 2023
Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Zhuolin Yang
Ming-Yu Liu
Zihan Liu
V. Korthikanti
Weili Nie
...
Yuke Zhu
Mohammad Shoeybi
Bryan Catanzaro
Chaowei Xiao
Anima Anandkumar
VLM
RALM
203
50
0
09 Feb 2023
Language Quantized AutoEncoders: Towards Unsupervised Text-Image Alignment
Neural Information Processing Systems (NeurIPS), 2023
Hao Liu
Wilson Yan
Pieter Abbeel
254
34
0
02 Feb 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
International Conference on Machine Learning (ICML), 2023
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
1.3K
6,661
0
30 Jan 2023
ChatGPT is not all you need. A State of the Art Review of large Generative AI models
Roberto Gozalo-Brizuela
E.C. Garrido-Merchán
243
328
0
11 Jan 2023
Aesthetically Relevant Image Captioning
AAAI Conference on Artificial Intelligence (AAAI), 2022
Zhipeng Zhong
Fei Zhou
Guoping Qiu
125
15
0
25 Nov 2022
ArtELingo: A Million Emotion Annotations of WikiArt with Emphasis on Diversity over Language and Culture
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Youssef Mohamed
Mohamed AbdelFattah
Shyma Alhuwaider
Feifan Li
Xiangliang Zhang
Kenneth Church
Mohamed Elhoseiny
VLM
229
18
0
19 Nov 2022
One-Time Model Adaptation to Heterogeneous Clients: An Intra-Client and Inter-Image Attention Design
Yikai Yan
Chaoyue Niu
Fan Wu
Qinya Li
Shaojie Tang
Chengfei Lyu
Guihai Chen
164
0
0
11 Nov 2022
Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
A. M. H. Tiong
Junnan Li
Boyang Albert Li
Silvio Savarese
Guosheng Lin
MLLM
256
130
0
17 Oct 2022
MAPL: Parameter-Efficient Adaptation of Unimodal Pre-Trained Models for Vision-Language Few-Shot Prompting
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2022
Oscar Manas
Pau Rodríguez López
Saba Ahmadi
Aida Nematzadeh
Yash Goyal
Aishwarya Agrawal
VLM
VPVLM
261
58
0
13 Oct 2022
Previous
1
2
3
4
Next