Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2004.06165
Cited By
v1
v2
v3
v4
v5 (latest)
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
European Conference on Computer Vision (ECCV), 2020
13 April 2020
Xiujun Li
Xi Yin
Chunyuan Li
Pengchuan Zhang
Xiaowei Hu
Lei Zhang
Lijuan Wang
Houdong Hu
Li Dong
Furu Wei
Yejin Choi
Jianfeng Gao
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks"
50 / 1,171 papers shown
Title
StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators
Rinon Gal
Or Patashnik
Haggai Maron
Gal Chechik
Daniel Cohen-Or
CLIP
VLM
187
264
0
02 Aug 2021
A Thorough Review on Recent Deep Learning Methodologies for Image Captioning
Ahmed Elhagry
Karima Kadaoui
VLM
84
20
0
28 Jul 2021
Is Object Detection Necessary for Human-Object Interaction Recognition?
Ying Jin
Yinpeng Chen
Lijuan Wang
Jianfeng Wang
Pei Yu
Zicheng Liu
Lei Li
106
7
0
27 Jul 2021
Language Models as Zero-shot Visual Semantic Learners
Yue Jiao
Jonathon S. Hare
Adam Prugel-Bennett
VLM
70
1
0
26 Jul 2021
What Remains of Visual Semantic Embeddings
Yue Jiao
Jonathon S. Hare
Adam Prugel-Bennett
VLM
58
0
0
26 Jul 2021
Multi-stage Pre-training over Simplified Multimodal Pre-training Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Tongtong Liu
Fangxiang Feng
Caixia Yuan
65
16
0
22 Jul 2021
Separating Skills and Concepts for Novel Visual Question Answering
Computer Vision and Pattern Recognition (CVPR), 2021
Spencer Whitehead
Hui Wu
Heng Ji
Rogerio Feris
Kate Saenko
CoGe
135
38
0
19 Jul 2021
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
Neural Information Processing Systems (NeurIPS), 2021
Junnan Li
Ramprasaath R. Selvaraju
Akhilesh Deepak Gotmare
Shafiq Joty
Caiming Xiong
Guosheng Lin
FaML
666
2,357
0
16 Jul 2021
From Show to Tell: A Survey on Deep Learning-based Image Captioning
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Matteo Stefanini
Marcella Cornia
Lorenzo Baraldi
S. Cascianelli
G. Fiameni
Rita Cucchiara
3DV
VLM
MLLM
259
329
0
14 Jul 2021
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen
Liunian Harold Li
Hao Tan
Joey Tianyi Zhou
Anna Rohrbach
Kai-Wei Chang
Z. Yao
Kurt Keutzer
CLIP
VLM
MLLM
420
458
0
13 Jul 2021
End-to-end Multi-modal Video Temporal Grounding
Yi-Wen Chen
Yi-Hsuan Tsai
Ming-Hsuan Yang
139
57
0
12 Jul 2021
Learning Vision-Guided Quadrupedal Locomotion End-to-End with Cross-Modal Transformers
International Conference on Learning Representations (ICLR), 2021
Ruihan Yang
Minghao Zhang
Nicklas Hansen
Huazhe Xu
Xiaolong Wang
OffRL
228
128
0
08 Jul 2021
Productivity, Portability, Performance: Data-Centric Python
Yiheng Wang
Yao Zhang
Yanzhang Wang
Yan Wan
Jiao Wang
Zhongyuan Wu
Yuhao Yang
Bowen She
297
107
0
01 Jul 2021
OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation
Jing Liu
Xinxin Zhu
Fei Liu
Longteng Guo
Zijia Zhao
...
Weining Wang
Hanqing Lu
Shiyu Zhou
Jiajun Zhang
Jinqiao Wang
172
41
0
01 Jul 2021
Multimodal Few-Shot Learning with Frozen Language Models
Neural Information Processing Systems (NeurIPS), 2021
Maria Tsimpoukelli
Jacob Menick
Serkan Cabi
S. M. Ali Eslami
Oriol Vinyals
Felix Hill
MLLM
407
878
0
25 Jun 2021
A Transformer-based Cross-modal Fusion Model with Adversarial Training for VQA Challenge 2021
Keda Lu
Bo Fang
Kuan-Yu Chen
ViT
68
2
0
24 Jun 2021
GEM: A General Evaluation Benchmark for Multimodal Tasks
Findings (Findings), 2021
Lin Su
Nan Duan
Edward Cui
Lei Ji
Chenfei Wu
Huaishao Luo
Yongfei Liu
Ming Zhong
Taroon Bharti
Arun Sacheti
VLM
141
21
0
18 Jun 2021
Efficient Self-supervised Vision Transformers for Representation Learning
International Conference on Learning Representations (ICLR), 2021
Chunyuan Li
Jianwei Yang
Pengchuan Zhang
Mei Gao
Bin Xiao
Xiyang Dai
Lu Yuan
Jianfeng Gao
ViT
195
221
0
17 Jun 2021
Semi-Autoregressive Transformer for Image Captioning
Yuanen Zhou
Yong Zhang
Zhenzhen Hu
Meng Wang
VLM
100
27
0
17 Jun 2021
Probing Image-Language Transformers for Verb Understanding
Lisa Anne Hendricks
Aida Nematzadeh
126
130
0
16 Jun 2021
A Fair and Comprehensive Comparison of Multimodal Tweet Sentiment Analysis Methods
Gullal Singh Cheema
Sherzod Hakimov
Eric MĂŒller-Budack
Ralph Ewerth
103
24
0
16 Jun 2021
Understanding and Evaluating Racial Biases in Image Captioning
Dora Zhao
Angelina Wang
Olga Russakovsky
196
153
0
16 Jun 2021
Pre-Trained Models: Past, Present and Future
AI Open (AO), 2021
Xu Han
Zhengyan Zhang
Ning Ding
Yuxian Gu
Xiao Liu
...
Jie Tang
Ji-Rong Wen
Jinhui Yuan
Wayne Xin Zhao
Jun Zhu
AIFin
MQ
AI4MH
253
952
0
14 Jun 2021
Assessing Multilingual Fairness in Pre-trained Multimodal Representations
Findings (Findings), 2021
Jialu Wang
Yang Liu
Xinze Wang
EGVM
168
41
0
12 Jun 2021
Team RUC_AIM3 Technical Report at ActivityNet 2021: Entities Object Localization
Ludan Ruan
Jieting Chen
Yuqing Song
Shizhe Chen
Qin Jin
48
0
0
11 Jun 2021
Check It Again: Progressive Visual Question Answering via Visual Entailment
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Q. Si
Zheng Lin
Mingyu Zheng
Peng Fu
Weiping Wang
103
50
0
08 Jun 2021
Chasing Sparsity in Vision Transformers: An End-to-End Exploration
Neural Information Processing Systems (NeurIPS), 2021
Tianlong Chen
Yu Cheng
Zhe Gan
Lu Yuan
Lei Zhang
Zhangyang Wang
ViT
134
245
0
08 Jun 2021
BERTGEN: Multi-task Generation through BERT
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Faidon Mitzalis
Ozan Caglayan
Pranava Madhyastha
Lucia Specia
VLM
80
7
0
07 Jun 2021
Human-Adversarial Visual Question Answering
Neural Information Processing Systems (NeurIPS), 2021
Sasha Sheng
Amanpreet Singh
Vedanuj Goswami
Jose Alberto Lopez Magana
Wojciech Galuba
Devi Parikh
Douwe Kiela
OOD
EgoV
AAML
74
68
0
04 Jun 2021
E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual Learning
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Haiyang Xu
Ming Yan
Chenliang Li
Bin Bi
Songfang Huang
Wenming Xiao
Fei Huang
VLM
230
123
0
03 Jun 2021
Learning to Select: A Fully Attentive Approach for Novel Object Captioning
International Conference on Multimedia Retrieval (ICMR), 2021
Marco Cagrandi
Marcella Cornia
Matteo Stefanini
Lorenzo Baraldi
Rita Cucchiara
121
9
0
02 Jun 2021
M6-T: Exploring Sparse Expert Models and Beyond
An Yang
Junyang Lin
Rui Men
Chang Zhou
Le Jiang
...
Dingyang Zhang
Jialin Li
Lin Qu
Jingren Zhou
Hongxia Yang
MoE
224
24
0
31 May 2021
Modeling Text-visual Mutual Dependency for Multi-modal Dialog Generation
Shuhe Wang
Yuxian Meng
Xiaofei Sun
Leilei Gan
Rongbin Ouyang
Rui Yan
Tianwei Zhang
Jiwei Li
128
15
0
30 May 2021
Learning Relation Alignment for Calibrated Cross-modal Retrieval
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Shuhuai Ren
Junyang Lin
Guangxiang Zhao
Rui Men
An Yang
Jingren Zhou
Xu Sun
Hongxia Yang
144
39
0
28 May 2021
Learning Better Visual Dialog Agents with Pretrained Visual-Linguistic Representation
Computer Vision and Pattern Recognition (CVPR), 2021
Tao Tu
Q. Ping
Govind Thattai
Gokhan Tur
Premkumar Natarajan
117
18
0
24 May 2021
More Than Just Attention: Improving Cross-Modal Attentions with Contrastive Constraints for Image-Text Matching
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2021
Yuxiao Chen
Jianbo Yuan
Long Zhao
Tianlang Chen
R. Luo
Larry S. Davis
Dimitris N. Metaxas
94
10
0
20 May 2021
Dependent Multi-Task Learning with Causal Intervention for Image Captioning
International Joint Conference on Artificial Intelligence (IJCAI), 2021
Wenqing Chen
Jidong Tian
Caoyun Fan
Hao He
Yaohui Jin
CML
175
8
0
18 May 2021
Survey of Visual-Semantic Embedding Methods for Zero-Shot Image Retrieval
International Conference on Machine Learning and Applications (ICMLA), 2021
K. Ueki
117
5
0
16 May 2021
Recent Advances in Deep Learning Based Dialogue Systems: A Systematic Survey
Artificial Intelligence Review (AIR), 2021
Jinjie Ni
Tom Young
Vlad Pandelea
Fuzhao Xue
Xiaoshi Zhong
611
309
0
10 May 2021
Comparing Visual Reasoning in Humans and AI
Shravan Murlidaran
Wenjie Wang
Miguel P. Eckstein
140
1
0
29 Apr 2021
Multimodal Contrastive Training for Visual Representation Learning
Computer Vision and Pattern Recognition (CVPR), 2021
Xin Yuan
Zhe Lin
Jason Kuen
Jianming Zhang
Yilin Wang
Michael Maire
Ajinkya Kale
Baldo Faieta
SSL
189
181
0
26 Apr 2021
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
IEEE International Conference on Computer Vision (ICCV), 2021
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
ObjD
VLM
481
1,015
0
26 Apr 2021
Playing Lottery Tickets with Vision and Language
AAAI Conference on Artificial Intelligence (AAAI), 2021
Zhe Gan
Yen-Chun Chen
Linjie Li
Tianlong Chen
Yu Cheng
Shuohang Wang
Jingjing Liu
Lijuan Wang
Zicheng Liu
VLM
250
62
0
23 Apr 2021
Understanding Synonymous Referring Expressions via Contrastive Features
International Journal of Computer Vision (IJCV), 2021
Yi-Wen Chen
Yi-Hsuan Tsai
Ming-Hsuan Yang
ObjD
115
5
0
20 Apr 2021
Understanding Chinese Video and Language via Contrastive Multimodal Pre-Training
ACM Multimedia (ACM MM), 2021
Chenyi Lei
Shixian Luo
Yong Liu
Wanggui He
Jiamang Wang
Guoxin Wang
Haihong Tang
Chunyan Miao
Houqiang Li
103
46
0
19 Apr 2021
LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding
Yiheng Xu
Tengchao Lv
Lei Cui
Guoxin Wang
Yijuan Lu
D. FlorĂȘncio
Cha Zhang
Furu Wei
MLLM
VLM
194
154
0
18 Apr 2021
TransVG: End-to-End Visual Grounding with Transformers
IEEE International Conference on Computer Vision (ICCV), 2021
Jiajun Deng
Zhengyuan Yang
Tianlang Chen
Wen-gang Zhou
Houqiang Li
ViT
282
422
0
17 Apr 2021
LAMPRET: Layout-Aware Multimodal PreTraining for Document Understanding
Te-Lin Wu
Cheng-rong Li
Mingyang Zhang
Tao Chen
Spurthi Amba Hombaiah
Michael Bendersky
115
15
0
16 Apr 2021
Concadia: Towards Image-Based Text Generation with a Purpose
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Elisa Kreiss
Fei Fang
Noah D. Goodman
Christopher Potts
162
24
0
16 Apr 2021
Cross-Modal Retrieval Augmentation for Multi-Modal Classification
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Shir Gur
Natalia Neverova
C. Stauffer
Ser-Nam Lim
Douwe Kiela
A. Reiter
171
36
0
16 Apr 2021
Previous
1
2
3
...
21
22
23
24
Next