Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2004.06165
Cited By
v1
v2
v3
v4
v5 (latest)
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
European Conference on Computer Vision (ECCV), 2020
13 April 2020
Xiujun Li
Xi Yin
Chunyuan Li
Pengchuan Zhang
Xiaowei Hu
Lei Zhang
Lijuan Wang
Houdong Hu
Li Dong
Furu Wei
Yejin Choi
Jianfeng Gao
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks"
21 / 1,171 papers shown
Title
Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs
Transactions of the Association for Computational Linguistics (TACL), 2020
Emanuele Bugliarello
Robert Bamler
Naoaki Okazaki
Desmond Elliott
160
124
0
30 Nov 2020
A Recurrent Vision-and-Language BERT for Navigation
Computer Vision and Pattern Recognition (CVPR), 2020
Yicong Hong
Qi Wu
Yuankai Qi
Cristian Rodriguez-Opazo
Stephen Gould
LM&Ro
222
358
0
26 Nov 2020
Multimodal Learning for Hateful Memes Detection
Yi Zhou
Zhenhao Chen
205
71
0
25 Nov 2020
Neuro-Symbolic Representations for Video Captioning: A Case for Leveraging Inductive Biases for Vision and Language
Hassan Akbari
Hamid Palangi
Jianwei Yang
Sudha Rao
Asli Celikyilmaz
Roland Fernandez
P. Smolensky
Jianfeng Gao
Shih-Fu Chang
145
3
0
18 Nov 2020
Unsupervised Vision-and-Language Pre-training Without Parallel Images and Captions
Liunian Harold Li
Haoxuan You
Zhecan Wang
Alireza Zareian
Shih-Fu Chang
Kai-Wei Chang
SSL
VLM
147
12
0
24 Oct 2020
Multimodal Research in Vision and Language: A Review of Current and Emerging Trends
Shagun Uppal
Sarthak Bhagat
Devamanyu Hazarika
Navonil Majumdar
Soujanya Poria
Roger Zimmermann
Amir Zadeh
221
6
0
19 Oct 2020
Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision
Hao Tan
Joey Tianyi Zhou
CLIP
139
126
0
14 Oct 2020
Attention Guided Semantic Relationship Parsing for Visual Question Answering
M. Farazi
Salman Khan
Nick Barnes
72
3
0
05 Oct 2020
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning
Xiaowei Hu
Xi Yin
Kevin Qinghong Lin
Lijuan Wang
Guang Dai
Jianfeng Gao
Zicheng Liu
VLM
163
58
0
28 Sep 2020
Weakly supervised cross-domain alignment with optimal transport
Siyang Yuan
Ke Bai
Liqun Chen
Yizhe Zhang
Chenyang Tao
Chunyuan Li
Guoyin Wang
Ricardo Henao
Lawrence Carin
OT
98
7
0
14 Aug 2020
Decomposing Generation Networks with Structure Prediction for Recipe Generation
Pattern Recognition (Pattern Recognit.), 2020
Hao Wang
Guosheng Lin
Guosheng Lin
Chunyan Miao
73
3
0
27 Jul 2020
ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph
Fei Yu
Jiji Tang
Weichong Yin
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
275
398
0
30 Jun 2020
VirTex: Learning Visual Representations from Textual Annotations
Computer Vision and Pattern Recognition (CVPR), 2020
Karan Desai
Justin Johnson
SSL
VLM
344
457
0
11 Jun 2020
Large-Scale Adversarial Training for Vision-and-Language Representation Learning
Neural Information Processing Systems (NeurIPS), 2020
Zhe Gan
Yen-Chun Chen
Linjie Li
Chen Zhu
Yu Cheng
Jingjing Liu
ObjD
VLM
297
527
0
11 Jun 2020
M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-training
Minheng Ni
Haoyang Huang
Lin Su
Edward Cui
Taroon Bharti
Lijuan Wang
Jianfeng Gao
Dongdong Zhang
Nan Duan
154
7
0
04 Jun 2020
TIME: Text and Image Mutual-Translation Adversarial Networks
AAAI Conference on Artificial Intelligence (AAAI), 2020
Bingchen Liu
Kunpeng Song
Yizhe Zhu
Gerard de Melo
Ahmed Elgammal
93
34
0
27 May 2020
HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Linjie Li
Yen-Chun Chen
Yu Cheng
Zhe Gan
Licheng Yu
Jingjing Liu
MLLM
VLM
OffRL
AI4TS
365
531
0
01 May 2020
XGPT: Cross-modal Generative Pre-Training for Image Captioning
Natural Language Processing and Chinese Computing (NLPCC), 2020
Qiaolin Xia
Haoyang Huang
Nan Duan
Dongdong Zhang
Lei Ji
Zhifang Sui
Edward Cui
Taroon Bharti
Xin Liu
Ming Zhou
MLLM
VLM
167
84
0
03 Mar 2020
Accuracy vs. Complexity: A Trade-off in Visual Question Answering Models
Pattern Recognition (Pattern Recognit.), 2020
M. Farazi
Salman H. Khan
Nick Barnes
148
18
0
20 Jan 2020
Explain and Improve: LRP-Inference Fine-Tuning for Image Captioning Models
Information Fusion (Inf. Fusion), 2020
Jiamei Sun
Sebastian Lapuschkin
Wojciech Samek
Alexander Binder
FAtt
294
34
0
04 Jan 2020
CRIC: A VQA Dataset for Compositional Reasoning on Vision and Commonsense
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2019
Difei Gao
Ruiping Wang
Shiguang Shan
Xilin Chen
CoGe
LRM
177
36
0
08 Aug 2019
Previous
1
2
3
...
22
23
24