Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2102.10407
Cited By
v1
v2
v3
v4
v5 (latest)
VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning
Computer Vision and Pattern Recognition (CVPR), 2021
20 February 2021
Jun Chen
Han Guo
Kai Yi
Boyang Albert Li
Mohamed Elhoseiny
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (331★)
Papers citing
"VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning"
15 / 165 papers shown
Medical Image Captioning via Generative Pretrained Transformers
Scientific Reports (Sci Rep), 2022
Alexander Selivanov
Oleg Y. Rogov
Daniil Chesakov
Artem Shelmanov
Irina Fedulova
Dmitry V. Dylov
MedIm
192
89
0
28 Sep 2022
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
ACM Computing Surveys (ACM CSUR), 2022
Paul Pu Liang
Amir Zadeh
Louis-Philippe Morency
310
166
0
07 Sep 2022
Pix4Point: Image Pretrained Standard Transformers for 3D Point Cloud Understanding
International Conference on 3D Vision (3DV), 2022
Guocheng Qian
Abdullah Hamdi
Xingdi Zhang
Guohao Li
3DPC
ViT
244
9
0
25 Aug 2022
Interpreting Song Lyrics with an Audio-Informed Pre-trained Language Model
International Society for Music Information Retrieval Conference (ISMIR), 2022
Yixiao Zhang
Junyan Jiang
Gus Xia
S. Dixon
124
10
0
24 Aug 2022
Personalized Showcases: Generating Multi-Modal Explanations for Recommendations
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2022
An Yan
Zhankui He
Jiacheng Li
Tianyang Zhang
Julian McAuley
252
54
0
30 Jun 2022
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Neural Information Processing Systems (NeurIPS), 2022
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
476
277
0
16 Jun 2022
LIFT: Language-Interfaced Fine-Tuning for Non-Language Machine Learning Tasks
Neural Information Processing Systems (NeurIPS), 2022
Tuan Dinh
Yuchen Zeng
Ruisu Zhang
Ziqian Lin
Michael Gira
Shashank Rajput
Jy-yong Sohn
Dimitris Papailiopoulos
Kangwook Lee
LMTD
559
167
0
14 Jun 2022
Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning
Neural Information Processing Systems (NeurIPS), 2022
Yujia Xie
Luowei Zhou
Xiyang Dai
Lu Yuan
Nguyen Bach
Ce Liu
Michael Zeng
VLM
MLLM
186
30
0
03 Jun 2022
Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction
Jun Chen
Ming Hu
Boyang Albert Li
Mohamed Elhoseiny
341
40
0
01 Jun 2022
Language Models Can See: Plugging Visual Controls in Text Generation
Yixuan Su
Tian Lan
Yahui Liu
Fangyu Liu
Dani Yogatama
Yan Wang
Lingpeng Kong
Nigel Collier
VLM
MLLM
270
111
0
05 May 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Neural Information Processing Systems (NeurIPS), 2022
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLM
VLM
695
4,861
0
29 Apr 2022
FS-COCO: Towards Understanding of Freehand Sketches of Common Objects in Context
European Conference on Computer Vision (ECCV), 2022
Pinaki Nath Chowdhury
Aneeshan Sain
A. Bhunia
Tao Xiang
Yulia Gryaditskaya
Yi-Zhe Song
3DV
326
65
0
04 Mar 2022
Pretrained Language Models for Text Generation: A Survey
ACM Computing Surveys (ACM CSUR), 2022
Junyi Li
Tianyi Tang
Wayne Xin Zhao
J. Nie
Ji-Rong Wen
AI4CE
519
257
0
14 Jan 2022
Multimodal Few-Shot Learning with Frozen Language Models
Neural Information Processing Systems (NeurIPS), 2021
Maria Tsimpoukelli
Jacob Menick
Serkan Cabi
S. M. Ali Eslami
Oriol Vinyals
Felix Hill
MLLM
520
900
0
25 Jun 2021
Transflower: probabilistic autoregressive dance generation with multimodal attention
ACM Transactions on Graphics (TOG), 2021
Guillermo Valle Pérez
G. Henter
Jonas Beskow
A. Holzapfel
Pierre-Yves Oudeyer
Simon Alexanderson
371
48
0
25 Jun 2021
Previous
1
2
3
4