ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.10407
  4. Cited By
VisualGPT: Data-efficient Adaptation of Pretrained Language Models for
  Image Captioning
v1v2v3v4v5 (latest)

VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning

Computer Vision and Pattern Recognition (CVPR), 2021
20 February 2021
Jun Chen
Han Guo
Kai Yi
Boyang Albert Li
Mohamed Elhoseiny
    VLM
ArXiv (abs)PDFHTMLGithub (331★)

Papers citing "VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning"

50 / 165 papers shown
The Solution for the CVPR2023 NICE Image Captioning Challenge
The Solution for the CVPR2023 NICE Image Captioning Challenge
Xiangyu Wu
Yi Gao
Hailiang Zhang
Yang Yang
Weili Guo
Jianfeng Lu
290
1
0
10 Oct 2023
MAGMA: Music Aligned Generative Motion Autodecoder
MAGMA: Music Aligned Generative Motion Autodecoder
Sohan Anisetty
Amit Raj
James Hays
151
0
0
03 Sep 2023
RenAIssance: A Survey into AI Text-to-Image Generation in the Era of
  Large Model
RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large ModelIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Fengxiang Bie
Jianlong Wu
Zhongzhu Zhou
Adam Ghanem
Minjia Zhang
...
Pareesa Ameneh Golnari
David A. Clifton
Yuxiong He
Dacheng Tao
Shuaiwen Leon Song
EGVM
254
56
0
02 Sep 2023
Overcoming Generic Knowledge Loss with Selective Parameter Update
Overcoming Generic Knowledge Loss with Selective Parameter UpdateComputer Vision and Pattern Recognition (CVPR), 2023
Wenxuan Zhang
Paul Janson
Rahaf Aljundi
Mohamed Elhoseiny
KELMCLL
377
20
0
23 Aug 2023
Learning to Model the World with Language
Learning to Model the World with LanguageInternational Conference on Machine Learning (ICML), 2023
Jessy Lin
Yuqing Du
Olivia Watkins
Danijar Hafner
Pieter Abbeel
Dan Klein
Anca Dragan
LM&RoSyDa
289
71
0
31 Jul 2023
Text-guided Foundation Model Adaptation for Pathological Image
  Classification
Text-guided Foundation Model Adaptation for Pathological Image ClassificationInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2023
Yunkun Zhang
Jinglei Gao
Mu Zhou
Xiaosong Wang
Yu Qiao
Shaoting Zhang
Yi Xu
MedIm
162
61
0
27 Jul 2023
Exploring Annotation-free Image Captioning with Retrieval-augmented
  Pseudo Sentence Generation
Exploring Annotation-free Image Captioning with Retrieval-augmented Pseudo Sentence GenerationACM Multimedia Asia (MA), 2023
Zhiyuan Li
Dongnan Liu
Heng Wang
Chaoyi Zhang
Weidong (Tom) Cai
RALM
188
1
0
27 Jul 2023
Is attention all you need in medical image analysis? A review
Is attention all you need in medical image analysis? A reviewIEEE journal of biomedical and health informatics (IEEE JBHI), 2023
G. Papanastasiou
Nikolaos Dikaios
Jiahao Huang
Chengjia Wang
Guang Yang
ViTMedIm
223
49
0
24 Jul 2023
Vesper: A Compact and Effective Pretrained Model for Speech Emotion
  Recognition
Vesper: A Compact and Effective Pretrained Model for Speech Emotion RecognitionIEEE Transactions on Affective Computing (IEEE Trans. Affective Comput.), 2023
Weidong Chen
Xiaofen Xing
Peihao Chen
Xiangmin Xu
VLM
300
65
0
20 Jul 2023
ChatGPT in the Age of Generative AI and Large Language Models: A Concise Survey
S. Mohamadi
Ghulam Mujtaba
Ngan Le
Gianfranco Doretto
Don Adjeroh
LM&MAAI4MH
303
37
0
09 Jul 2023
Image Background Serves as Good Proxy for Out-of-distribution Data
Image Background Serves as Good Proxy for Out-of-distribution DataInternational Conference on Learning Representations (ICLR), 2023
Sen Pei
271
3
0
02 Jul 2023
Integrating Large Pre-trained Models into Multimodal Named Entity
  Recognition with Evidential Fusion
Integrating Large Pre-trained Models into Multimodal Named Entity Recognition with Evidential Fusion
Weide Liu
Xiaoyang Zhong
Jingwen Hou
Shaohua Li
Haozhe Huang
Yuming Fang
EDL
179
5
0
29 Jun 2023
Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen
  Large Language Models
Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models
Junting Pan
Ziyi Lin
Yuying Ge
Xiatian Zhu
Renrui Zhang
Yi Wang
Yu Qiao
Jiaming Song
MLLM
177
35
0
15 Jun 2023
Artificial General Intelligence for Medical Imaging
Artificial General Intelligence for Medical ImagingIEEE Reviews in Biomedical Engineering (RBME), 2023
Xiang Li
Lu Zhang
Zihao Wu
Zheng Liu
Lin Zhao
...
Pingkuan Yan
Shijie Zhao
Wen Liu
Tianming Liu
Hongtu Zhu
LM&MAAI4CE
293
57
0
08 Jun 2023
Security Knowledge-Guided Fuzzing of Deep Learning Libraries
Security Knowledge-Guided Fuzzing of Deep Learning Libraries
Nima Shiri Harzevili
Mohammad Mahdi Mohajer
Moshi Wei
H. Pham
Song Wang
AAMLAI4CE
198
1
0
05 Jun 2023
A survey of Generative AI Applications
A survey of Generative AI ApplicationsJournal of Computer Science (JCS), 2023
Roberto Gozalo-Brizuela
Eduardo C. Garrido-Merchán
3DVMedIm
378
135
0
05 Jun 2023
Exploring Open-Vocabulary Semantic Segmentation without Human Labels
Exploring Open-Vocabulary Semantic Segmentation without Human Labels
Jun Chen
Deyao Zhu
Guocheng Qian
Guohao Li
Zhicheng Yan
Chenchen Zhu
Fanyi Xiao
Mohamed Elhoseiny
Sean Culatana
VLM
220
12
0
01 Jun 2023
GPT4Image: Large Pre-trained Models Help Vision Models Learn Better on Perception Task
GPT4Image: Large Pre-trained Models Help Vision Models Learn Better on Perception TaskThe Web Conference (WWW), 2023
Ning Ding
Yehui Tang
Zhongqian Fu
Chaoting Xu
Kai Han
Yunhe Wang
MLLMVLM
125
2
0
01 Jun 2023
Domain Specialization as the Key to Make Large Language Models
  Disruptive: A Comprehensive Survey
Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive SurveyACM Computing Surveys (ACM Comput. Surv.), 2023
Chen Ling
Xujiang Zhao
Jiaying Lu
Chengyuan Deng
Can Zheng
...
Chris White
Quanquan Gu
Jian Pei
Carl Yang
Bo Pan
ALM
410
214
0
30 May 2023
Contextual Object Detection with Multimodal Large Language Models
Contextual Object Detection with Multimodal Large Language ModelsInternational Journal of Computer Vision (IJCV), 2023
Yuhang Zang
Wei Li
Jun Han
Kaiyang Zhou
Chen Change Loy
ObjDVLMMLLM
325
141
0
29 May 2023
On Evaluating Adversarial Robustness of Large Vision-Language Models
On Evaluating Adversarial Robustness of Large Vision-Language ModelsNeural Information Processing Systems (NeurIPS), 2023
Yunqing Zhao
Tianyu Pang
Chao Du
Xiao Yang
Chongxuan Li
Ngai-Man Cheung
Min Lin
VLMAAMLMLLM
480
264
0
26 May 2023
EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought
EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of ThoughtNeural Information Processing Systems (NeurIPS), 2023
Yao Mu
Qinglong Zhang
Mengkang Hu
Wen Wang
Mingyu Ding
Jun Jin
Sijin Yu
Jifeng Dai
Yu Qiao
Ping Luo
LM&RoLRM
389
348
0
24 May 2023
VideoLLM: Modeling Video Sequence with Large Language Models
VideoLLM: Modeling Video Sequence with Large Language Models
Guo Chen
Yin-Dong Zheng
Jiahao Wang
Jilan Xu
Yifei Huang
...
Yi Wang
Yali Wang
Yu Qiao
Tong Lu
Limin Wang
MLLM
261
113
0
22 May 2023
X-IQE: eXplainable Image Quality Evaluation for Text-to-Image Generation
  with Visual Large Language Models
X-IQE: eXplainable Image Quality Evaluation for Text-to-Image Generation with Visual Large Language Models
Yixiong Chen
Li Liu
C. Ding
174
29
0
18 May 2023
MedBLIP: Bootstrapping Language-Image Pre-training from 3D Medical
  Images and Texts
MedBLIP: Bootstrapping Language-Image Pre-training from 3D Medical Images and TextsAsian Conference on Computer Vision (ACCV), 2023
Qiuhui Chen
Xinyue Hu
Zirui Wang
Yi Hong
LM&MAMedIm
174
67
0
18 May 2023
ArtGPT-4: Towards Artistic-understanding Large Vision-Language Models
  with Enhanced Adapter
ArtGPT-4: Towards Artistic-understanding Large Vision-Language Models with Enhanced Adapter
Zheng Yuan
HU Xue
Kun Wang
Yongming Liu
Kun Wang
VLMMLLM
378
12
0
12 May 2023
Automatic Radiology Report Generation by Learning with Increasingly Hard
  Negatives
Automatic Radiology Report Generation by Learning with Increasingly Hard NegativesEuropean Conference on Artificial Intelligence (ECAI), 2023
Bhanu Prakash Voutharoja
Lei Wang
Luping Zhou
MedIm
147
13
0
11 May 2023
Image-to-Text Translation for Interactive Image Recognition: A
  Comparative User Study with Non-Expert Users
Image-to-Text Translation for Interactive Image Recognition: A Comparative User Study with Non-Expert UsersJournal of Information Processing (JIP), 2023
Wataru Kawabe
Yusuke Sugano
VLM
153
2
0
11 May 2023
Vision-Language Models in Remote Sensing: Current Progress and Future
  Trends
Vision-Language Models in Remote Sensing: Current Progress and Future TrendsIEEE Geoscience and Remote Sensing Magazine (GRSM), 2023
Xiang Li
Congcong Wen
Yuan Hu
Zhenghang Yuan
Xiao Xiang Zhu
VLM
352
159
0
09 May 2023
Making the Most of What You Have: Adapting Pre-trained Visual Language
  Models in the Low-data Regime
Making the Most of What You Have: Adapting Pre-trained Visual Language Models in the Low-data Regime
Chuhan Zhang
Antoine Miech
Jiajun Shen
Jean-Baptiste Alayrac
Pauline Luc
VLMVPVLM
228
2
0
03 May 2023
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large
  Language Models
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023
Deyao Zhu
Jun Chen
Xiaoqian Shen
Xiang Li
Mohamed Elhoseiny
VLMMLLM
467
2,724
0
20 Apr 2023
Verbs in Action: Improving verb understanding in video-language models
Verbs in Action: Improving verb understanding in video-language modelsIEEE International Conference on Computer Vision (ICCV), 2023
Liliane Momeni
Mathilde Caron
Arsha Nagrani
Andrew Zisserman
Cordelia Schmid
373
87
0
13 Apr 2023
Advancing Medical Imaging with Language Models: A Journey from N-grams
  to ChatGPT
Advancing Medical Imaging with Language Models: A Journey from N-grams to ChatGPT
Mingzhe Hu
Shaoyan Pan
Yuheng Li
Xiaofeng Yang
LM&MA
233
49
0
11 Apr 2023
Video ChatCaptioner: Towards Enriched Spatiotemporal Descriptions
Video ChatCaptioner: Towards Enriched Spatiotemporal Descriptions
Jun Chen
Deyao Zhu
Kilichbek Haydarov
Xiang Li
Mohamed Elhoseiny
264
44
0
09 Apr 2023
When Brain-inspired AI Meets AGI
When Brain-inspired AI Meets AGI
Lin Zhao
Lu Zhang
Zihao Wu
Yuzhong Chen
Haixing Dai
...
Xi Jiang
Xiang Li
Dajiang Zhu
Hongtu Zhu
Tianming Liu
AI4CE
168
115
0
28 Mar 2023
eP-ALM: Efficient Perceptual Augmentation of Language Models
eP-ALM: Efficient Perceptual Augmentation of Language ModelsIEEE International Conference on Computer Vision (ICCV), 2023
Mustafa Shukor
Corentin Dancette
Matthieu Cord
MLLMVLM
417
34
0
20 Mar 2023
Decomposed Prototype Learning for Few-Shot Scene Graph Generation
Decomposed Prototype Learning for Few-Shot Scene Graph Generation
Xingchen Li
Long Chen
Guikun Chen
Yinfu Feng
Yi Yang
Jun Xiao
176
7
0
20 Mar 2023
Cross-Modal Causal Intervention for Medical Report Generation
Cross-Modal Causal Intervention for Medical Report GenerationIEEE Transactions on Image Processing (IEEE TIP), 2023
Weixing Chen
Yang-Yang Liu
Ce Wang
Jiarui Zhu
Shen Zhao
Guanbin Li
Cheng-Lin Liu
329
7
0
16 Mar 2023
ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched
  Visual Descriptions
ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual Descriptions
Deyao Zhu
Jun Chen
Kilichbek Haydarov
Xiaoqian Shen
Wenxuan Zhang
Mohamed Elhoseiny
MLLM
236
125
0
12 Mar 2023
Learning Combinatorial Prompts for Universal Controllable Image
  Captioning
Learning Combinatorial Prompts for Universal Controllable Image CaptioningInternational Journal of Computer Vision (IJCV), 2023
Zhen Wang
Jun Xiao
Yueting Zhuang
Fei Gao
Jian Shao
Long Chen
200
12
0
11 Mar 2023
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation
  Models
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models
Chenfei Wu
Sheng-Kai Yin
Weizhen Qi
Xiaodong Wang
Zecheng Tang
Nan Duan
MLLMLRM
358
765
0
08 Mar 2023
Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot
  Image Captioning
Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image CaptioningConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Zhuolin Yang
Ming-Yu Liu
Zihan Liu
V. Korthikanti
Weili Nie
...
Yuke Zhu
Mohammad Shoeybi
Bryan Catanzaro
Chaowei Xiao
Anima Anandkumar
VLMRALM
203
50
0
09 Feb 2023
Language Quantized AutoEncoders: Towards Unsupervised Text-Image
  Alignment
Language Quantized AutoEncoders: Towards Unsupervised Text-Image AlignmentNeural Information Processing Systems (NeurIPS), 2023
Hao Liu
Wilson Yan
Pieter Abbeel
254
34
0
02 Feb 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language ModelsInternational Conference on Machine Learning (ICML), 2023
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLMMLLM
1.3K
6,661
0
30 Jan 2023
ChatGPT is not all you need. A State of the Art Review of large
  Generative AI models
ChatGPT is not all you need. A State of the Art Review of large Generative AI models
Roberto Gozalo-Brizuela
E.C. Garrido-Merchán
243
328
0
11 Jan 2023
Aesthetically Relevant Image Captioning
Aesthetically Relevant Image CaptioningAAAI Conference on Artificial Intelligence (AAAI), 2022
Zhipeng Zhong
Fei Zhou
Guoping Qiu
125
15
0
25 Nov 2022
ArtELingo: A Million Emotion Annotations of WikiArt with Emphasis on
  Diversity over Language and Culture
ArtELingo: A Million Emotion Annotations of WikiArt with Emphasis on Diversity over Language and CultureConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Youssef Mohamed
Mohamed AbdelFattah
Shyma Alhuwaider
Feifan Li
Xiangliang Zhang
Kenneth Church
Mohamed Elhoseiny
VLM
229
18
0
19 Nov 2022
One-Time Model Adaptation to Heterogeneous Clients: An Intra-Client and
  Inter-Image Attention Design
One-Time Model Adaptation to Heterogeneous Clients: An Intra-Client and Inter-Image Attention Design
Yikai Yan
Chaoyue Niu
Fan Wu
Qinya Li
Shaojie Tang
Chengfei Lyu
Guihai Chen
164
0
0
11 Nov 2022
Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models
  with Zero Training
Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero TrainingConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
A. M. H. Tiong
Junnan Li
Boyang Albert Li
Silvio Savarese
Guosheng Lin
MLLM
256
130
0
17 Oct 2022
MAPL: Parameter-Efficient Adaptation of Unimodal Pre-Trained Models for
  Vision-Language Few-Shot Prompting
MAPL: Parameter-Efficient Adaptation of Unimodal Pre-Trained Models for Vision-Language Few-Shot PromptingConference of the European Chapter of the Association for Computational Linguistics (EACL), 2022
Oscar Manas
Pau Rodríguez López
Saba Ahmadi
Aida Nematzadeh
Yash Goyal
Aishwarya Agrawal
VLMVPVLM
261
58
0
13 Oct 2022
Previous
1234
Next