ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.10407
  4. Cited By
VisualGPT: Data-efficient Adaptation of Pretrained Language Models for
  Image Captioning
v1v2v3v4v5 (latest)

VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning

Computer Vision and Pattern Recognition (CVPR), 2021
20 February 2021
Jun Chen
Han Guo
Kai Yi
Boyang Albert Li
Mohamed Elhoseiny
    VLM
ArXiv (abs)PDFHTMLGithub (331★)

Papers citing "VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning"

15 / 165 papers shown
Medical Image Captioning via Generative Pretrained Transformers
Medical Image Captioning via Generative Pretrained TransformersScientific Reports (Sci Rep), 2022
Alexander Selivanov
Oleg Y. Rogov
Daniil Chesakov
Artem Shelmanov
Irina Fedulova
Dmitry V. Dylov
MedIm
192
89
0
28 Sep 2022
Foundations and Trends in Multimodal Machine Learning: Principles,
  Challenges, and Open Questions
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open QuestionsACM Computing Surveys (ACM CSUR), 2022
Paul Pu Liang
Amir Zadeh
Louis-Philippe Morency
310
166
0
07 Sep 2022
Pix4Point: Image Pretrained Standard Transformers for 3D Point Cloud
  Understanding
Pix4Point: Image Pretrained Standard Transformers for 3D Point Cloud UnderstandingInternational Conference on 3D Vision (3DV), 2022
Guocheng Qian
Abdullah Hamdi
Xingdi Zhang
Guohao Li
3DPCViT
244
9
0
25 Aug 2022
Interpreting Song Lyrics with an Audio-Informed Pre-trained Language
  Model
Interpreting Song Lyrics with an Audio-Informed Pre-trained Language ModelInternational Society for Music Information Retrieval Conference (ISMIR), 2022
Yixiao Zhang
Junyan Jiang
Gus Xia
S. Dixon
124
10
0
24 Aug 2022
Personalized Showcases: Generating Multi-Modal Explanations for
  Recommendations
Personalized Showcases: Generating Multi-Modal Explanations for RecommendationsAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2022
An Yan
Zhankui He
Jiacheng Li
Tianyang Zhang
Julian McAuley
252
54
0
30 Jun 2022
Zero-Shot Video Question Answering via Frozen Bidirectional Language
  Models
Zero-Shot Video Question Answering via Frozen Bidirectional Language ModelsNeural Information Processing Systems (NeurIPS), 2022
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
476
277
0
16 Jun 2022
LIFT: Language-Interfaced Fine-Tuning for Non-Language Machine Learning
  Tasks
LIFT: Language-Interfaced Fine-Tuning for Non-Language Machine Learning TasksNeural Information Processing Systems (NeurIPS), 2022
Tuan Dinh
Yuchen Zeng
Ruisu Zhang
Ziqian Lin
Michael Gira
Shashank Rajput
Jy-yong Sohn
Dimitris Papailiopoulos
Kangwook Lee
LMTD
559
167
0
14 Jun 2022
Visual Clues: Bridging Vision and Language Foundations for Image
  Paragraph Captioning
Visual Clues: Bridging Vision and Language Foundations for Image Paragraph CaptioningNeural Information Processing Systems (NeurIPS), 2022
Yujia Xie
Luowei Zhou
Xiyang Dai
Lu Yuan
Nguyen Bach
Ce Liu
Michael Zeng
VLMMLLM
186
30
0
03 Jun 2022
Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction
Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction
Jun Chen
Ming Hu
Boyang Albert Li
Mohamed Elhoseiny
341
40
0
01 Jun 2022
Language Models Can See: Plugging Visual Controls in Text Generation
Language Models Can See: Plugging Visual Controls in Text Generation
Yixuan Su
Tian Lan
Yahui Liu
Fangyu Liu
Dani Yogatama
Yan Wang
Lingpeng Kong
Nigel Collier
VLMMLLM
270
111
0
05 May 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Flamingo: a Visual Language Model for Few-Shot LearningNeural Information Processing Systems (NeurIPS), 2022
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLMVLM
695
4,861
0
29 Apr 2022
FS-COCO: Towards Understanding of Freehand Sketches of Common Objects in
  Context
FS-COCO: Towards Understanding of Freehand Sketches of Common Objects in ContextEuropean Conference on Computer Vision (ECCV), 2022
Pinaki Nath Chowdhury
Aneeshan Sain
A. Bhunia
Tao Xiang
Yulia Gryaditskaya
Yi-Zhe Song
3DV
326
65
0
04 Mar 2022
Pretrained Language Models for Text Generation: A Survey
Pretrained Language Models for Text Generation: A SurveyACM Computing Surveys (ACM CSUR), 2022
Junyi Li
Tianyi Tang
Wayne Xin Zhao
J. Nie
Ji-Rong Wen
AI4CE
519
257
0
14 Jan 2022
Multimodal Few-Shot Learning with Frozen Language Models
Multimodal Few-Shot Learning with Frozen Language ModelsNeural Information Processing Systems (NeurIPS), 2021
Maria Tsimpoukelli
Jacob Menick
Serkan Cabi
S. M. Ali Eslami
Oriol Vinyals
Felix Hill
MLLM
520
900
0
25 Jun 2021
Transflower: probabilistic autoregressive dance generation with
  multimodal attention
Transflower: probabilistic autoregressive dance generation with multimodal attentionACM Transactions on Graphics (TOG), 2021
Guillermo Valle Pérez
G. Henter
Jonas Beskow
A. Holzapfel
Pierre-Yves Oudeyer
Simon Alexanderson
371
48
0
25 Jun 2021
Previous
1234