Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2004.06165
Cited By
v1
v2
v3
v4
v5 (latest)
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
European Conference on Computer Vision (ECCV), 2020
13 April 2020
Xiujun Li
Xi Yin
Chunyuan Li
Pengchuan Zhang
Xiaowei Hu
Lei Zhang
Lijuan Wang
Houdong Hu
Li Dong
Furu Wei
Yejin Choi
Jianfeng Gao
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks"
50 / 1,171 papers shown
Title
MultiModalQA: Complex Question Answering over Text, Tables and Images
International Conference on Learning Representations (ICLR), 2021
Alon Talmor
Ori Yoran
Amnon Catav
Dan Lahav
Yizhong Wang
Akari Asai
Gabriel Ilharco
Hannaneh Hajishirzi
Jonathan Berant
LMTD
174
195
0
13 Apr 2021
The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation
IEEE International Conference on Computer Vision (ICCV), 2021
Yuankai Qi
Zizheng Pan
Yicong Hong
Ming-Hsuan Yang
Anton Van Den Hengel
Qi Wu
LM&Ro
142
75
0
09 Apr 2021
Compressing Visual-linguistic Model via Knowledge Distillation
IEEE International Conference on Computer Vision (ICCV), 2021
Zhiyuan Fang
Jianfeng Wang
Xiaowei Hu
Lijuan Wang
Yezhou Yang
Zicheng Liu
VLM
175
110
0
05 Apr 2021
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference
IEEE International Conference on Computer Vision (ICCV), 2021
Ben Graham
Alaaeldin El-Nouby
Hugo Touvron
Pierre Stock
Armand Joulin
Edouard Grave
Matthijs Douze
ViT
240
912
0
02 Apr 2021
VisQA: X-raying Vision and Language Reasoning in Transformers
IEEE Transactions on Visualization and Computer Graphics (TVCG), 2021
Theo Jaunet
Corentin Kervadec
Romain Vuillemot
G. Antipov
M. Baccouche
Christian Wolf
154
32
0
02 Apr 2021
Towards General Purpose Vision Systems
Computer Vision and Pattern Recognition (CVPR), 2021
Tanmay Gupta
Amita Kamath
Aniruddha Kembhavi
Derek Hoiem
178
55
0
01 Apr 2021
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
IEEE International Conference on Computer Vision (ICCV), 2021
Max Bain
Arsha Nagrani
GĂŒl Varol
Andrew Zisserman
VGen
502
1,386
0
01 Apr 2021
UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training
Computer Vision and Pattern Recognition (CVPR), 2021
Mingyang Zhou
Luowei Zhou
Shuohang Wang
Yu Cheng
Linjie Li
Zhou Yu
Jingjing Liu
MLLM
VLM
133
101
0
01 Apr 2021
Zero-Shot Language Transfer vs Iterative Back Translation for Unsupervised Machine Translation
Aviral Joshi
Chengzhi Huang
H. Singh
91
0
0
31 Mar 2021
StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
IEEE International Conference on Computer Vision (ICCV), 2021
Or Patashnik
Zongze Wu
Eli Shechtman
Daniel Cohen-Or
Dani Lischinski
CLIP
VLM
340
1,333
0
31 Mar 2021
Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers
Computer Vision and Pattern Recognition (CVPR), 2021
Antoine Miech
Jean-Baptiste Alayrac
Ivan Laptev
Josef Sivic
Andrew Zisserman
ViT
234
155
0
30 Mar 2021
Kaleido-BERT: Vision-Language Pre-training on Fashion Domain
Computer Vision and Pattern Recognition (CVPR), 2021
Mingchen Zhuge
D. Gao
Deng-Ping Fan
Linbo Jin
Ben Chen
Hao Zhou
Minghui Qiu
Ling Shao
VLM
198
130
0
30 Mar 2021
Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers
IEEE International Conference on Computer Vision (ICCV), 2021
Hila Chefer
Shir Gur
Lior Wolf
ViT
246
391
0
29 Mar 2021
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding
IEEE International Conference on Computer Vision (ICCV), 2021
Pengchuan Zhang
Xiyang Dai
Jianwei Yang
Bin Xiao
Lu Yuan
Lei Zhang
Jianfeng Gao
ViT
206
360
0
29 Mar 2021
Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for Improved Cross-Modal Retrieval
Transactions of the Association for Computational Linguistics (TACL), 2021
Gregor Geigle
Jonas Pfeiffer
Nils Reimers
Ivan VuliÄ
Iryna Gurevych
222
60
0
22 Mar 2021
LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval
North American Chapter of the Association for Computational Linguistics (NAACL), 2021
Siqi Sun
Yen-Chun Chen
Linjie Li
Shuohang Wang
Yuwei Fang
Jingjing Liu
VLM
131
89
0
16 Mar 2021
SemVLP: Vision-Language Pre-training by Aligning Semantics at Multiple Levels
Chenliang Li
Ming Yan
Haiyang Xu
Fuli Luo
Wei Wang
Bin Bi
Songfang Huang
VLM
106
39
0
14 Mar 2021
Perspectives and Prospects on Transformer Architecture for Cross-Modal Tasks with Language and Vision
International Journal of Computer Vision (IJCV), 2021
Andrew Shin
Masato Ishii
T. Narihira
185
45
0
06 Mar 2021
Causal Attention for Vision-Language Tasks
Computer Vision and Pattern Recognition (CVPR), 2021
Xu Yang
Hanwang Zhang
Guojun Qi
Jianfei Cai
CML
161
185
0
05 Mar 2021
Feature Boosting, Suppression, and Diversification for Fine-Grained Visual Classification
IEEE International Joint Conference on Neural Network (IJCNN), 2021
Jianwei Song
Ruoyu Yang
149
45
0
04 Mar 2021
M6: A Chinese Multimodal Pretrainer
Junyang Lin
Rui Men
An Yang
Chan Zhou
Ming Ding
...
Yong Li
Jialin Li
Jingren Zhou
J. Tang
Hongxia Yang
VLM
MoE
244
145
0
01 Mar 2021
Detecting Harmful Content On Online Platforms: What Platforms Need Vs. Where Research Efforts Go
ACM Computing Surveys (CSUR), 2021
Arnav Arora
Preslav Nakov
Momchil Hardalov
Sheikh Muhammad Sarwar
Vibha Nayak
...
Dimitrina Zlatkova
Kyle Dent
Ameya Bhatawdekar
Guillaume Bouchard
Isabelle Augenstein
167
64
0
27 Feb 2021
Learning Transferable Visual Models From Natural Language Supervision
International Conference on Machine Learning (ICML), 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
1.8K
38,195
0
26 Feb 2021
VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning
Computer Vision and Pattern Recognition (CVPR), 2021
Jun Chen
Han Guo
Kai Yi
Boyang Albert Li
Mohamed Elhoseiny
VLM
316
261
0
20 Feb 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Computer Vision and Pattern Recognition (CVPR), 2021
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
838
1,302
0
17 Feb 2021
Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling
Computer Vision and Pattern Recognition (CVPR), 2021
Jie Lei
Linjie Li
Luowei Zhou
Zhe Gan
Tamara L. Berg
Joey Tianyi Zhou
Jingjing Liu
CLIP
324
730
0
11 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
International Conference on Machine Learning (ICML), 2021
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
1.0K
4,624
0
11 Feb 2021
ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
International Conference on Machine Learning (ICML), 2021
Wonjae Kim
Bokyung Son
Ildoo Kim
VLM
CLIP
459
2,019
0
05 Feb 2021
Unifying Vision-and-Language Tasks via Text Generation
International Conference on Machine Learning (ICML), 2021
Jaemin Cho
Jie Lei
Hao Tan
Joey Tianyi Zhou
MLLM
525
596
0
04 Feb 2021
Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers
Transactions of the Association for Computational Linguistics (TACL), 2021
Lisa Anne Hendricks
John F. J. Mellor
R. Schneider
Jean-Baptiste Alayrac
Aida Nematzadeh
174
124
0
31 Jan 2021
VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs
Computer Vision and Pattern Recognition (CVPR), 2021
Xudong Lin
Gedas Bertasius
Jue Wang
Shih-Fu Chang
Devi Parikh
Lorenzo Torresani
VGen
136
73
0
28 Jan 2021
Cross-lingual Visual Pre-training for Multimodal Machine Translation
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2021
Ozan Caglayan
Menekse Kuyu
Mustafa Sercan Amac
Pranava Madhyastha
Erkut Erdem
Aykut Erdem
Lucia Specia
VLM
107
48
0
25 Jan 2021
ECOL-R: Encouraging Copying in Novel Object Captioning with Reinforcement Learning
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2021
Yufei Wang
Ian D. Wood
Stephen Wan
Mark Johnson
99
8
0
25 Jan 2021
Latent Variable Models for Visual Question Answering
Zixu Wang
Yishu Miao
Lucia Specia
181
5
0
16 Jan 2021
Are We There Yet? Learning to Localize in Embodied Instruction Following
Shane Storks
Qiaozi Gao
Govind Thattai
Gokhan Tur
LM&Ro
152
11
0
09 Jan 2021
Contextual Non-Local Alignment over Full-Scale Representation for Text-Based Person Search
Chen Gao
Guanyu Cai
Xinyang Jiang
Feng Zheng
Jinchao Zhang
Yifei Gong
Pai Peng
Xiao-Wei Guo
Xing Sun
DiffM
202
117
0
08 Jan 2021
Transformers in Vision: A Survey
ACM Computing Surveys (CSUR), 2021
Salman Khan
Muzammal Naseer
Munawar Hayat
Syed Waqas Zamir
Fahad Shahbaz Khan
M. Shah
ViT
693
2,959
0
04 Jan 2021
VinVL: Revisiting Visual Representations in Vision-Language Models
Pengchuan Zhang
Xiujun Li
Xiaowei Hu
Jianwei Yang
Lei Zhang
Lijuan Wang
Yejin Choi
Jianfeng Gao
ObjD
VLM
409
167
0
02 Jan 2021
KM-BART: Knowledge Enhanced Multimodal BART for Visual Commonsense Generation
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Yiran Xing
Z. Shi
Zhao Meng
Gerhard Lakemeyer
Yunpu Ma
Roger Wattenhofer
VLM
208
43
0
02 Jan 2021
VisualSparta: An Embarrassingly Simple Approach to Large-scale Text-to-Image Search with Weighted Bag-of-words
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Xiaopeng Lu
Tiancheng Zhao
Kyusong Lee
161
29
0
01 Jan 2021
UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning
Annual Meeting of the Association for Computational Linguistics (ACL), 2020
Wei Li
Can Gao
Guocheng Niu
Xinyan Xiao
Hao Liu
Jiachen Liu
Hua Wu
Haifeng Wang
539
403
0
31 Dec 2020
OpenViDial: A Large-Scale, Open-Domain Dialogue Dataset with Visual Contexts
Yuxian Meng
Shuhe Wang
Qinghong Han
Xiaofei Sun
Leilei Gan
Rui Yan
Jiwei Li
261
31
0
30 Dec 2020
Detecting Hate Speech in Multi-modal Memes
Abhishek Das
Japsimar Singh Wahi
Siyao Li
100
74
0
29 Dec 2020
A Multimodal Framework for the Detection of Hateful Memes
Phillip Lippe
Nithin Holla
Shantanu Chandra
S. Rajamanickam
Georgios Antoniou
Ekaterina Shutova
H. Yannakoudakis
179
87
0
23 Dec 2020
Object-Centric Diagnosis of Visual Reasoning
Jianwei Yang
Jiayuan Mao
Jiajun Wu
Devi Parikh
David D. Cox
J. Tenenbaum
Chuang Gan
OCL
122
17
0
21 Dec 2020
A Closer Look at the Robustness of Vision-and-Language Pre-trained Models
Linjie Li
Zhe Gan
Jingjing Liu
VLM
157
49
0
15 Dec 2020
Enhance Multimodal Transformer With External Label And In-Domain Pretrain: Hateful Meme Challenge Winning Solution
Ron Zhu
86
89
0
15 Dec 2020
Vilio: State-of-the-art Visio-Linguistic Models applied to Hateful Memes
Niklas Muennighoff
102
71
0
14 Dec 2020
MiniVLM: A Smaller and Faster Vision-Language Model
Jianfeng Wang
Xiaowei Hu
Pengchuan Zhang
Xiujun Li
Lijuan Wang
Guang Dai
Jianfeng Gao
Zicheng Liu
VLM
MLLM
150
67
0
13 Dec 2020
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption
Zhengyuan Yang
Yijuan Lu
Jianfeng Wang
Xi Yin
D. FlorĂȘncio
Lijuan Wang
Cha Zhang
Lei Zhang
Jiebo Luo
VLM
182
156
0
08 Dec 2020
Previous
1
2
3
...
22
23
24
Next