v1v2v3v4v5 (latest)

VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning

Computer Vision and Pattern Recognition (CVPR), 2021

20 February 2021

ArXiv (abs)PDF HTML Github (331★)

Papers citing "VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning"

50 / 165 papers shown

The Solution for the CVPR2023 NICE Image Captioning Challenge

290

10 Oct 2023

MAGMA: Music Aligned Generative Motion Autodecoder

Sohan Anisetty

Amit Raj

James Hays

151

03 Sep 2023

RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large ModelIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023

...

Pareesa Ameneh Golnari

Yuxiong He

254

02 Sep 2023

Overcoming Generic Knowledge Loss with Selective Parameter UpdateComputer Vision and Pattern Recognition (CVPR), 2023

377

23 Aug 2023

Learning to Model the World with LanguageInternational Conference on Machine Learning (ICML), 2023

Pieter Abbeel

289

31 Jul 2023

Text-guided Foundation Model Adaptation for Pathological Image ClassificationInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2023

Yunkun Zhang

Jinglei Gao

Mu Zhou

Xiaosong Wang

Yu Qiao

Shaoting Zhang

Yi Xu

MedIm

162

27 Jul 2023

Exploring Annotation-free Image Captioning with Retrieval-augmented Pseudo Sentence GenerationACM Multimedia Asia (MA), 2023

Zhiyuan Li

Dongnan Liu

Heng Wang

Chaoyi Zhang

Weidong (Tom) Cai

RALM

188

27 Jul 2023

Is attention all you need in medical image analysis? A reviewIEEE journal of biomedical and health informatics (IEEE JBHI), 2023

Jiahao Huang

Guang Yang

223

24 Jul 2023

Vesper: A Compact and Effective Pretrained Model for Speech Emotion RecognitionIEEE Transactions on Affective Computing (IEEE Trans. Affective Comput.), 2023

300

20 Jul 2023

ChatGPT in the Age of Generative AI and Large Language Models: A Concise Survey

Ngan Le

303

09 Jul 2023

Image Background Serves as Good Proxy for Out-of-distribution DataInternational Conference on Learning Representations (ICLR), 2023

Sen Pei

271

02 Jul 2023

Integrating Large Pre-trained Models into Multimodal Named Entity Recognition with Evidential Fusion

179

29 Jun 2023

Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models

Yi Wang

Yu Qiao

Jiaming Song

MLLM

177

15 Jun 2023

Artificial General Intelligence for Medical ImagingIEEE Reviews in Biomedical Engineering (RBME), 2023

Xiang Li

Lu Zhang

...

Tianming Liu

293

08 Jun 2023

Security Knowledge-Guided Fuzzing of Deep Learning Libraries

Nima Shiri Harzevili

Mohammad Mahdi Mohajer

198

05 Jun 2023

A survey of Generative AI ApplicationsJournal of Computer Science (JCS), 2023

Roberto Gozalo-Brizuela

Eduardo C. Garrido-Merchán

3DV MedIm

378

135

05 Jun 2023

Exploring Open-Vocabulary Semantic Segmentation without Human Labels

220

01 Jun 2023

GPT4Image: Large Pre-trained Models Help Vision Models Learn Better on Perception TaskThe Web Conference (WWW), 2023

125

01 Jun 2023

Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive SurveyACM Computing Surveys (ACM Comput. Surv.), 2023

...

Quanquan Gu

410

214

30 May 2023

Contextual Object Detection with Multimodal Large Language ModelsInternational Journal of Computer Vision (IJCV), 2023

325

141

29 May 2023

On Evaluating Adversarial Robustness of Large Vision-Language ModelsNeural Information Processing Systems (NeurIPS), 2023

480

264

26 May 2023

EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of ThoughtNeural Information Processing Systems (NeurIPS), 2023

Mingyu Ding

Yu Qiao

Ping Luo

LM&Ro LRM

389

348

24 May 2023

VideoLLM: Modeling Video Sequence with Large Language Models

Yifei Huang

...

Yi Wang

Yu Qiao

261

113

22 May 2023

X-IQE: eXplainable Image Quality Evaluation for Text-to-Image Generation with Visual Large Language Models

Yixiong Chen

Li Liu

C. Ding

174

18 May 2023

MedBLIP: Bootstrapping Language-Image Pre-training from 3D Medical Images and TextsAsian Conference on Computer Vision (ACCV), 2023

174

18 May 2023

ArtGPT-4: Towards Artistic-understanding Large Vision-Language Models with Enhanced Adapter

Kun Wang

378

12 May 2023

Automatic Radiology Report Generation by Learning with Increasingly Hard NegativesEuropean Conference on Artificial Intelligence (ECAI), 2023

Bhanu Prakash Voutharoja

Lei Wang

Luping Zhou

MedIm

147

11 May 2023

Image-to-Text Translation for Interactive Image Recognition: A Comparative User Study with Non-Expert UsersJournal of Information Processing (JIP), 2023

Wataru Kawabe

Yusuke Sugano

VLM

153

11 May 2023

Vision-Language Models in Remote Sensing: Current Progress and Future TrendsIEEE Geoscience and Remote Sensing Magazine (GRSM), 2023

Xiao Xiang Zhu

352

159

09 May 2023

Making the Most of What You Have: Adapting Pre-trained Visual Language Models in the Low-data Regime

Chuhan Zhang

Antoine Miech

Jiajun Shen

Jean-Baptiste Alayrac

Pauline Luc

VLM VPVLM

228

03 May 2023

MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023

467

2,724

20 Apr 2023

Verbs in Action: Improving verb understanding in video-language modelsIEEE International Conference on Computer Vision (ICCV), 2023

373

13 Apr 2023

Advancing Medical Imaging with Language Models: A Journey from N-grams to ChatGPT

233

11 Apr 2023

Video ChatCaptioner: Towards Enriched Spatiotemporal Descriptions

264

09 Apr 2023

When Brain-inspired AI Meets AGI

Lu Zhang

...

Xiang Li

Dajiang Zhu

Hongtu Zhu

Tianming Liu

AI4CE

168

115

28 Mar 2023

eP-ALM: Efficient Perceptual Augmentation of Language ModelsIEEE International Conference on Computer Vision (ICCV), 2023

417

20 Mar 2023

Decomposed Prototype Learning for Few-Shot Scene Graph Generation

Guikun Chen

Yi Yang

176

20 Mar 2023

Cross-Modal Causal Intervention for Medical Report GenerationIEEE Transactions on Image Processing (IEEE TIP), 2023

329

16 Mar 2023

ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual Descriptions

236

125

12 Mar 2023

Learning Combinatorial Prompts for Universal Controllable Image CaptioningInternational Journal of Computer Vision (IJCV), 2023

Zhen Wang

Jun Xiao

Yueting Zhuang

Fei Gao

Jian Shao

Long Chen

200

11 Mar 2023

Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models

358

765

08 Mar 2023

Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image CaptioningConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

...

203

09 Feb 2023

Language Quantized AutoEncoders: Towards Unsupervised Text-Image AlignmentNeural Information Processing Systems (NeurIPS), 2023

Hao Liu

Wilson Yan

Pieter Abbeel

254

02 Feb 2023

BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language ModelsInternational Conference on Machine Learning (ICML), 2023

Silvio Savarese

1.3K

6,661

30 Jan 2023

ChatGPT is not all you need. A State of the Art Review of large Generative AI models

Roberto Gozalo-Brizuela

E.C. Garrido-Merchán

243

328

11 Jan 2023

Aesthetically Relevant Image CaptioningAAAI Conference on Artificial Intelligence (AAAI), 2022

Zhipeng Zhong

Fei Zhou

Guoping Qiu

125

25 Nov 2022

ArtELingo: A Million Emotion Annotations of WikiArt with Emphasis on Diversity over Language and CultureConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Feifan Li

229

19 Nov 2022

One-Time Model Adaptation to Heterogeneous Clients: An Intra-Client and Inter-Image Attention Design

164

11 Nov 2022

Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero TrainingConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Silvio Savarese

256

130

17 Oct 2022

MAPL: Parameter-Efficient Adaptation of Unimodal Pre-Trained Models for Vision-Language Few-Shot PromptingConference of the European Chapter of the Association for Computational Linguistics (EACL), 2022

261

13 Oct 2022