Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1908.02265
Cited By
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Neural Information Processing Systems (NeurIPS), 2019
6 August 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks"
50 / 2,232 papers shown
Efficient Multi-Modal Embeddings from Structured Data
A. Vero
Ann A. Copestake
118
4
0
06 Oct 2021
Word Acquisition in Neural Language Models
Tyler A. Chang
Benjamin Bergen
268
46
0
05 Oct 2021
A Survey On Neural Word Embeddings
Erhan Sezerer
Selma Tekir
AI4TS
271
20
0
05 Oct 2021
ProTo: Program-Guided Transformer for Program-Guided Tasks
Zelin Zhao
Karan Samel
Binghong Chen
Le Song
ViT
LM&Ro
260
32
0
02 Oct 2021
Visually Grounded Concept Composition
Bowen Zhang
Hexiang Hu
Linlu Qiu
Peter Shaw
Fei Sha
CoGe
192
7
0
29 Sep 2021
Visually Grounded Reasoning across Languages and Cultures
Fangyu Liu
Emanuele Bugliarello
Edoardo Ponti
Siva Reddy
Nigel Collier
Desmond Elliott
VLM
LRM
483
202
0
28 Sep 2021
Audio-to-Image Cross-Modal Generation
IEEE International Joint Conference on Neural Network (IJCNN), 2021
Maciej Żelaszczyk
Jacek Mańdziuk
DiffM
202
20
0
27 Sep 2021
VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual Question Answering
Conference on Computational Natural Language Learning (CoNLL), 2021
Ekta Sood
Fabian Kögel
Florian Strohm
Prajit Dhar
Andreas Bulling
172
21
0
27 Sep 2021
Why Do We Click: Visual Impression-aware News Recommendation
ACM Multimedia (ACM MM), 2021
Jiahao Xun
Shengyu Zhang
Zhou Zhao
Jieming Zhu
Tao Gui
Jingjie Li
Xiuqiang He
Xiaofei He
Tat-Seng Chua
Leilei Gan
238
39
0
26 Sep 2021
Systematic Generalization on gSCAN: What is Nearly Solved and What is Next?
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Linlu Qiu
Hexiang Hu
Bowen Zhang
Peter Shaw
Fei Sha
172
23
0
25 Sep 2021
MLIM: Vision-and-Language Model Pre-training with Masked Language and Image Modeling
Tarik Arici
M. S. Seyfioglu
T. Neiman
Yi Tian Xu
Son N. Tran
Trishul Chilimbi
Belinda Zeng
Ismail B. Tutar
VLM
121
16
0
24 Sep 2021
CLIPort: What and Where Pathways for Robotic Manipulation
Conference on Robot Learning (CoRL), 2021
Mohit Shridhar
Lucas Manuelli
Dieter Fox
LM&Ro
344
819
0
24 Sep 2021
Detecting Harmful Memes and Their Targets
Findings (Findings), 2021
Shraman Pramanick
Dimitar Dimitrov
Rituparna Mukherjee
Shivam Sharma
Md. Shad Akhtar
Preslav Nakov
Tanmoy Chakraborty
182
151
0
24 Sep 2021
CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models
Yuan Yao
Ao Zhang
Zhengyan Zhang
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
MLLM
VPVLM
VLM
594
244
0
24 Sep 2021
Dense Contrastive Visual-Linguistic Pretraining
ACM Multimedia (ACM MM), 2021
Lei Shi
Kai Shuang
Shijie Geng
Shiyang Feng
Zuohui Fu
Gerard de Melo
Yunpeng Chen
Sen Su
VLM
SSL
240
12
0
24 Sep 2021
Transferring Knowledge from Vision to Language: How to Achieve it and how to Measure it?
BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackBoxNLP), 2021
Tobias Norlund
Lovisa Hagström
Richard Johansson
275
25
0
23 Sep 2021
Pairwise Emotional Relationship Recognition in Drama Videos: Dataset and Benchmark
ACM Multimedia (ACM MM), 2021
Xun Gao
Yin Zhao
Jie Zhang
Longjun Cai
142
9
0
23 Sep 2021
Cross-Modal Coherence for Text-to-Image Retrieval
AAAI Conference on Artificial Intelligence (AAAI), 2021
Malihe Alikhani
Fangda Han
Hareesh Ravi
Mubbasir Kapadia
Vladimir Pavlovic
Matthew Stone
179
10
0
22 Sep 2021
Caption Enriched Samples for Improving Hateful Memes Detection
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Efrat Blaier
Itzik Malkiel
Lior Wolf
VLM
166
23
0
22 Sep 2021
COVR: A test-bed for Visually Grounded Compositional Generalization with real images
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Ben Bogin
Shivanshu Gupta
Matt Gardner
Jonathan Berant
CoGe
180
30
0
22 Sep 2021
KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object Knowledge Distillation
Yongfei Liu
Chenfei Wu
Shao-Yen Tseng
Vasudev Lal
Xuming He
Nan Duan
CLIP
VLM
283
32
0
22 Sep 2021
Survey: Transformer based Video-Language Pre-training
Ludan Ruan
Qin Jin
VLM
ViT
210
50
0
21 Sep 2021
ActionCLIP: A New Paradigm for Video Action Recognition
Mengmeng Wang
Jiazheng Xing
Yong Liu
VLM
415
467
0
17 Sep 2021
An End-to-End Transformer Model for 3D Object Detection
Ishan Misra
Rohit Girdhar
Armand Joulin
3DPC
ViT
436
574
0
16 Sep 2021
A Survey on Temporal Sentence Grounding in Videos
Xiaohan Lan
Yitian Yuan
Xin Eric Wang
Zhi Wang
Wenwu Zhu
321
57
0
16 Sep 2021
Image Captioning for Effective Use of Language Models in Knowledge-Based Visual Question Answering
Ander Salaberria
Gorka Azkune
Oier López de Lacalle
Aitor Soroa Etxabe
Eneko Agirre
301
73
0
15 Sep 2021
What Vision-Language Models `See' when they See Scenes
Michele Cafagna
Kees van Deemter
Albert Gatt
VLM
264
13
0
15 Sep 2021
Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning
Da Yin
Liunian Harold Li
Ziniu Hu
Nanyun Peng
Kai-Wei Chang
300
65
0
14 Sep 2021
Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering
Jihyung Kil
Cheng Zhang
D. Xuan
Wei-Lun Chao
264
23
0
13 Sep 2021
xGQA: Cross-Lingual Visual Question Answering
Jonas Pfeiffer
Gregor Geigle
Aishwarya Kamath
Jan-Martin O. Steitz
Stefan Roth
Ivan Vulić
Iryna Gurevych
362
80
0
13 Sep 2021
TEASEL: A Transformer-Based Speech-Prefixed Language Model
Mehdi Arjmand
M. Dousti
H. Moradi
147
23
0
12 Sep 2021
COSMic: A Coherence-Aware Generation Metric for Image Descriptions
Mert Inan
P. Sharma
Baber Khalid
Radu Soricut
Matthew Stone
Malihe Alikhani
EGVM
156
14
0
11 Sep 2021
A Survey on Multi-modal Summarization
Anubhav Jangra
Sourajit Mukherjee
Adam Jatowt
S. Saha
M. Hasanuzzaman
206
79
0
11 Sep 2021
MOMENTA: A Multimodal Framework for Detecting Harmful Memes and Their Targets
Shraman Pramanick
Shivam Sharma
Dimitar Dimitrov
Md. Shad Akhtar
Preslav Nakov
Tanmoy Chakraborty
224
169
0
11 Sep 2021
An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA
AAAI Conference on Artificial Intelligence (AAAI), 2021
Zhengyuan Yang
Zhe Gan
Jianfeng Wang
Xiaowei Hu
Yumao Lu
Zicheng Liu
Lijuan Wang
611
489
0
10 Sep 2021
Panoptic Narrative Grounding
IEEE International Conference on Computer Vision (ICCV), 2021
Cristina González
Nicolás Ayobi
Isabela Hernández
José Hernández
Jordi Pont-Tuset
Pablo Arbeláez
258
28
0
10 Sep 2021
We went to look for meaning and all we got were these lousy representations: aspects of meaning representation for computational semantics
Simon Dobnik
R. Cooper
Adam Ek
Bill Noble
Staffan Larsson
N. Ilinykh
Vladislav Maraev
Vidya Somashekarappa
138
0
0
10 Sep 2021
Towards Developing a Multilingual and Code-Mixed Visual Question Answering System by Knowledge Distillation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
H. Khan
D. Gupta
Asif Ekbal
166
17
0
10 Sep 2021
Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Stella Frank
Emanuele Bugliarello
Desmond Elliott
184
94
0
09 Sep 2021
TxT: Crossmodal End-to-End Learning with Transformers
German Conference on Pattern Recognition (DAGM), 2021
Jan-Martin O. Steitz
Jonas Pfeiffer
Iryna Gurevych
Stefan Roth
LRM
130
2
0
09 Sep 2021
M5Product: Self-harmonized Contrastive Learning for E-commercial Multi-modal Pretraining
Computer Vision and Pattern Recognition (CVPR), 2021
Xiao Dong
Xunlin Zhan
Yangxin Wu
Yunchao Wei
Michael C. Kampffmeyer
Xiaoyong Wei
Minlong Lu
Yaowei Wang
Xiaodan Liang
589
46
0
09 Sep 2021
Retrieve, Caption, Generate: Visual Grounding for Enhancing Commonsense in Text Generation Models
AAAI Conference on Artificial Intelligence (AAAI), 2021
Steven Y. Feng
Kevin Lu
Zhuofu Tao
Malihe Alikhani
Teruko Mitamura
Eduard H. Hovy
Varun Gangal
LRM
226
14
0
08 Sep 2021
Self-supervised Contrastive Cross-Modality Representation Learning for Spoken Question Answering
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Chenyu You
Polydoros Giannouris
Yuexian Zou
SSL
217
65
0
08 Sep 2021
Learning grounded word meaning representations on similarity graphs
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Mariella Dimiccoli
H. Wendt
Pau Batlle
158
1
0
07 Sep 2021
CTRL-C: Camera calibration TRansformer with Line-Classification
IEEE International Conference on Computer Vision (ICCV), 2021
Jinwoo Lee
Hyun-Young Go
Hyunjoon Lee
Sunghyun Cho
Minhyuk Sung
Junho Kim
ViT
208
46
0
06 Sep 2021
Learning to Generate Scene Graph from Natural Language Supervision
Yiwu Zhong
Jing Shi
Jianwei Yang
Chenliang Xu
Yin Li
SSL
263
85
0
06 Sep 2021
Data Efficient Masked Language Modeling for Vision and Language
Yonatan Bitton
Gabriel Stanovsky
Michael Elhadad
Roy Schwartz
VLM
235
21
0
05 Sep 2021
LAViTeR: Learning Aligned Visual and Textual Representations Assisted by Image and Caption Generation
Mohammad Abuzar Shaikh
Zhanghexuan Ji
Dana Moukheiber
Yan Shen
S. Srihari
Mingchen Gao
VLM
171
1
0
04 Sep 2021
Weakly Supervised Relative Spatial Reasoning for Visual Question Answering
Pratyay Banerjee
Tejas Gokhale
Yezhou Yang
Chitta Baral
LRM
163
19
0
04 Sep 2021
Supervised Contrastive Learning for Multimodal Unreliable News Detection in COVID-19 Pandemic
Wenjia Zhang
Lin Gui
Yulan He
141
43
0
04 Sep 2021
Previous
1
2
3
...
34
35
36
...
43
44
45
Next
Page 35 of 45
Page
of 45
Go