Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1908.02265
Cited By
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Neural Information Processing Systems (NeurIPS), 2019
6 August 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks"
50 / 2,232 papers shown
End-to-end Multi-modal Video Temporal Grounding
Yi-Wen Chen
Yi-Hsuan Tsai
Ming-Hsuan Yang
268
61
0
12 Jul 2021
MECT: Multi-Metadata Embedding based Cross-Transformer for Chinese Named Entity Recognition
Shuang Wu
Xiaoning Song
Zhenhua Feng
212
130
0
12 Jul 2021
BERT-like Pre-training for Symbolic Piano Music Classification Tasks
Yi-Hui Chou
I-Chun Chen
Chin-Jui Chang
Joann Ching
Yi-Hsuan Yang
275
28
0
12 Jul 2021
Zero-Shot Compositional Concept Learning
Guangyue Xu
Parisa Kordjamshidi
J. Chai
CoGe
226
24
0
12 Jul 2021
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELM
ALM
2.2K
8,106
0
07 Jul 2021
Deep Learning for Embodied Vision Navigation: A Survey
Fengda Zhu
Yi Zhu
Vincent CS Lee
Xiaodan Liang
Xiaojun Chang
EgoV
LM&Ro
506
0
0
07 Jul 2021
VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer
Zineng Tang
Jaemin Cho
Hao Tan
Joey Tianyi Zhou
VLM
195
34
0
06 Jul 2021
PhotoChat: A Human-Human Dialogue Dataset with Photo Sharing Behavior for Joint Image-Text Modeling
Xiaoxue Zang
Lijuan Liu
Maria Wang
Yang Song
Hao Zhang
Jindong Chen
VLM
259
65
0
06 Jul 2021
Cognitive Visual Commonsense Reasoning Using Dynamic Working Memory
Xuejiao Tang
Xin Huang
Wenbin Zhang
T. Child
Qiong Hu
Zhen Liu
Ji Zhang
LRM
201
20
0
04 Jul 2021
Target-dependent UNITER: A Transformer-Based Multimodal Language Comprehension Model for Domestic Service Robots
Shintaro Ishikawa
K. Sugiura
171
13
0
02 Jul 2021
Case Relation Transformer: A Crossmodal Language Generation Model for Fetching Instructions
Motonari Kambara
K. Sugiura
ViT
150
6
0
02 Jul 2021
Productivity, Portability, Performance: Data-Centric Python
Yiheng Wang
Yao Zhang
Yanzhang Wang
Yan Wan
Jiao Wang
Zhongyuan Wu
Yuhao Yang
Bowen She
416
112
0
01 Jul 2021
OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation
Jing Liu
Xinxin Zhu
Fei Liu
Longteng Guo
Zijia Zhao
...
Weining Wang
Hanqing Lu
Shiyu Zhou
Jiajun Zhang
Jinqiao Wang
306
41
0
01 Jul 2021
Attention Bottlenecks for Multimodal Fusion
Neural Information Processing Systems (NeurIPS), 2021
Arsha Nagrani
Shan Yang
Anurag Arnab
A. Jansen
Cordelia Schmid
Chen Sun
590
710
0
30 Jun 2021
The Values Encoded in Machine Learning Research
Conference on Fairness, Accountability and Transparency (FAccT), 2021
Abeba Birhane
Pratyusha Kalluri
Dallas Card
William Agnew
Ravit Dotan
Michelle Bao
354
342
0
29 Jun 2021
Adventurer's Treasure Hunt: A Transparent System for Visually Grounded Compositional Visual Question Answering based on Scene Graphs
Daniel Reich
F. Putze
Tanja Schultz
215
2
0
28 Jun 2021
UMIC: An Unreferenced Metric for Image Captioning via Contrastive Learning
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Hwanhee Lee
Seunghyun Yoon
Franck Dernoncourt
Trung Bui
Kyomin Jung
VLM
249
51
0
26 Jun 2021
Core Challenges in Embodied Vision-Language Planning
Journal of Artificial Intelligence Research (JAIR), 2021
Jonathan M Francis
Nariaki Kitamura
Felix Labelle
Xiaopeng Lu
Ingrid Navarro
Jean Oh
LM&Ro
547
58
0
26 Jun 2021
Multimodal Few-Shot Learning with Frozen Language Models
Neural Information Processing Systems (NeurIPS), 2021
Maria Tsimpoukelli
Jacob Menick
Serkan Cabi
S. M. Ali Eslami
Oriol Vinyals
Felix Hill
MLLM
557
907
0
25 Jun 2021
Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training
Hongwei Xue
Yupan Huang
Bei Liu
Houwen Peng
Jianlong Fu
Houqiang Li
Jiebo Luo
414
94
0
25 Jun 2021
A Picture May Be Worth a Hundred Words for Visual Question Answering
Yusuke Hirota
Noa Garcia
Mayu Otani
Chenhui Chu
Yuta Nakashima
Ittetsu Taniguchi
Takao Onoye
ViT
148
4
0
25 Jun 2021
iReason: Multimodal Commonsense Reasoning using Videos and Natural Language with Interpretability
Andrew Wang
Vasu Sharma
CML
146
5
0
25 Jun 2021
A Transformer-based Cross-modal Fusion Model with Adversarial Training for VQA Challenge 2021
Keda Lu
Bo Fang
Kuan-Yu Chen
ViT
96
2
0
24 Jun 2021
DocFormer: End-to-End Transformer for Document Understanding
IEEE International Conference on Computer Vision (ICCV), 2021
Srikar Appalaraju
Bhavan A. Jasani
Bhargava Urala Kota
Yusheng Xie
R. Manmatha
ViT
380
353
0
22 Jun 2021
Towards Long-Form Video Understanding
Computer Vision and Pattern Recognition (CVPR), 2021
Chaoxia Wu
Philipp Krahenbuhl
VLM
ViT
335
195
0
21 Jun 2021
GEM: A General Evaluation Benchmark for Multimodal Tasks
Findings (Findings), 2021
Lin Su
Nan Duan
Edward Cui
Lei Ji
Chenfei Wu
Huaishao Luo
Yongfei Liu
Ming Zhong
Taroon Bharti
Arun Sacheti
VLM
228
22
0
18 Jun 2021
Efficient Self-supervised Vision Transformers for Representation Learning
International Conference on Learning Representations (ICLR), 2021
Chunyuan Li
Jianwei Yang
Pengchuan Zhang
Mei Gao
Bin Xiao
Xiyang Dai
Lu Yuan
Jianfeng Gao
ViT
313
225
0
17 Jun 2021
Probing Image-Language Transformers for Verb Understanding
Lisa Anne Hendricks
Aida Nematzadeh
216
132
0
16 Jun 2021
A Fair and Comprehensive Comparison of Multimodal Tweet Sentiment Analysis Methods
Gullal Singh Cheema
Sherzod Hakimov
Eric Müller-Budack
Ralph Ewerth
174
27
0
16 Jun 2021
Vision-Language Navigation with Random Environmental Mixup
IEEE International Conference on Computer Vision (ICCV), 2021
Chong Liu
Fengda Zhu
Xiaojun Chang
Xiaodan Liang
Zongyuan Ge
Yi-Dong Shen
LM&Ro
301
107
0
15 Jun 2021
Pre-Trained Models: Past, Present and Future
AI Open (AO), 2021
Xu Han
Zhengyan Zhang
Ning Ding
Yuxian Gu
Xiao Liu
...
Jie Tang
Ji-Rong Wen
Jinhui Yuan
Wayne Xin Zhao
Jun Zhu
AIFin
MQ
AI4MH
392
995
0
14 Jun 2021
Assessing Multilingual Fairness in Pre-trained Multimodal Representations
Findings (Findings), 2021
Jialu Wang
Yang Liu
Xinze Wang
EGVM
238
42
0
12 Jun 2021
Team RUC_AIM3 Technical Report at ActivityNet 2021: Entities Object Localization
Ludan Ruan
Jieting Chen
Yuqing Song
Shizhe Chen
Qin Jin
93
0
0
11 Jun 2021
ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation
Findings (Findings), 2021
Wanrong Zhu
Xinze Wang
An Yan
Miguel P. Eckstein
Wenjie Wang
147
7
0
10 Jun 2021
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Neural Information Processing Systems (NeurIPS), 2021
Mandela Patrick
Dylan Campbell
Yuki M. Asano
Ishan Misra
Ishan Misra Florian Metze
Christoph Feichtenhofer
Andrea Vedaldi
João F. Henriques
293
342
0
09 Jun 2021
Bayesian Attention Belief Networks
International Conference on Machine Learning (ICML), 2021
Shujian Zhang
Xinjie Fan
Bo Chen
Mingyuan Zhou
BDL
261
36
0
09 Jun 2021
PAM: Understanding Product Images in Cross Product Category Attribute Extraction
Knowledge Discovery and Data Mining (KDD), 2021
Rongmei Lin
Xiang He
J. Feng
Nasser Zalmout
Yan Liang
Li Xiong
Xin Luna Dong
215
38
0
08 Jun 2021
Chasing Sparsity in Vision Transformers: An End-to-End Exploration
Neural Information Processing Systems (NeurIPS), 2021
Tianlong Chen
Yu Cheng
Zhe Gan
Lu Yuan
Lei Zhang
Zinan Lin
ViT
260
259
0
08 Jun 2021
BERTGEN: Multi-task Generation through BERT
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Faidon Mitzalis
Ozan Caglayan
Pranava Madhyastha
Lucia Specia
VLM
123
7
0
07 Jun 2021
SelfDoc: Self-Supervised Document Representation Learning
Computer Vision and Pattern Recognition (CVPR), 2021
Peizhao Li
Jiuxiang Gu
Jason Kuen
Vlad I. Morariu
Handong Zhao
R. Jain
Varun Manjunatha
Hongfu Liu
ViT
SSL
196
180
0
07 Jun 2021
Oriented Object Detection with Transformer
Teli Ma
Mingyuan Mao
Honghui Zheng
Shiyang Feng
Xiaodi Wang
Shumin Han
Errui Ding
Baochang Zhang
David Doermann
ViT
161
59
0
06 Jun 2021
Referring Transformer: A One-step Approach to Multi-task Visual Grounding
Neural Information Processing Systems (NeurIPS), 2021
Muchen Li
Leonid Sigal
ObjD
343
239
0
06 Jun 2021
MERLOT: Multimodal Neural Script Knowledge Models
Neural Information Processing Systems (NeurIPS), 2021
Rowan Zellers
Ximing Lu
Jack Hessel
Youngjae Yu
J. S. Park
Jize Cao
Ali Farhadi
Yejin Choi
VLM
LRM
358
430
0
04 Jun 2021
Human-Adversarial Visual Question Answering
Neural Information Processing Systems (NeurIPS), 2021
Sasha Sheng
Amanpreet Singh
Vedanuj Goswami
Jose Alberto Lopez Magana
Wojciech Galuba
Devi Parikh
Douwe Kiela
OOD
EgoV
AAML
128
69
0
04 Jun 2021
Scalable Transformers for Neural Machine Translation
Shiyang Feng
Shijie Geng
Ping Luo
Xiaogang Wang
Jifeng Dai
Jiaming Song
231
14
0
04 Jun 2021
E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual Learning
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Haiyang Xu
Ming Yan
Chenliang Li
Bin Bi
Songfang Huang
Wenming Xiao
Fei Huang
VLM
324
128
0
03 Jun 2021
TVDIM: Enhancing Image Self-Supervised Pretraining via Noisy Text Data
Pengda Qin
Yuhong Li
Kefeng Deng
Qiang Wu
126
1
0
03 Jun 2021
Attention mechanisms and deep learning for machine vision: A survey of the state of the art
A. M. Hafiz
S. A. Parah
R. A. Bhat
229
57
0
03 Jun 2021
More Identifiable yet Equally Performant Transformers for Text Classification
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Rishabh Bhardwaj
Navonil Majumder
Soujanya Poria
Eduard H. Hovy
75
7
0
02 Jun 2021
Towards Efficient Cross-Modal Visual Textual Retrieval using Transformer-Encoder Deep Features
International Conference on Content-Based Multimedia Indexing (CBMI), 2021
Nicola Messina
Giuseppe Amato
Fabrizio Falchi
Claudio Gennaro
Stéphane Marchand-Maillet
98
8
0
01 Jun 2021
Previous
1
2
3
...
36
37
38
...
43
44
45
Next
Page 37 of 45
Page
of 45
Go