Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1504.00325
Cited By
v1
v2 (latest)
Microsoft COCO Captions: Data Collection and Evaluation Server
1 April 2015
Xinlei Chen
Hao Fang
Nayeon Lee
Ramakrishna Vedantam
Saurabh Gupta
Piotr Dollar
C. L. Zitnick
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Microsoft COCO Captions: Data Collection and Evaluation Server"
50 / 1,515 papers shown
Title
In Defense of Grid Features for Visual Question Answering
Computer Vision and Pattern Recognition (CVPR), 2020
Huaizu Jiang
Ishan Misra
Marcus Rohrbach
Erik Learned-Miller
Xinlei Chen
OOD
ObjD
287
350
0
10 Jan 2020
All-in-One Image-Grounded Conversational Agents
Da Ju
Kurt Shuster
Y-Lan Boureau
Jason Weston
LLMAG
133
9
0
28 Dec 2019
DDI-100: Dataset for Text Detection and Recognition
I. Zharikov
Filipp Nikitin
I. Vasiliev
V. Dokholyan
160
17
0
25 Dec 2019
Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection
Guangxiang Zhao
Junyang Lin
Zhiyuan Zhang
Xuancheng Ren
Qi Su
Xu Sun
138
133
0
25 Dec 2019
Multimodal Generative Models for Compositional Representation Learning
Mike Wu
Noah D. Goodman
GAN
DRL
157
20
0
11 Dec 2019
Controlling Style and Semantics in Weakly-Supervised Image Generation
European Conference on Computer Vision (ECCV), 2019
Dario Pavllo
Aurelien Lucchi
Thomas Hofmann
170
35
0
06 Dec 2019
Connecting Vision and Language with Localized Narratives
European Conference on Computer Vision (ECCV), 2019
Jordi Pont-Tuset
J. Uijlings
Soravit Changpinyo
Radu Soricut
V. Ferrari
ObjD
402
283
0
06 Dec 2019
12-in-1: Multi-Task Vision and Language Representation Learning
Computer Vision and Pattern Recognition (CVPR), 2019
Jiasen Lu
Vedanuj Goswami
Marcus Rohrbach
Devi Parikh
Stefan Lee
VLM
ObjD
271
498
0
05 Dec 2019
Exposing and Correcting the Gender Bias in Image Captioning Datasets and Models
Shruti Bhargava
David A. Forsyth
FaML
140
55
0
02 Dec 2019
Multimodal Machine Translation through Visuals and Speech
Machine Translation (MT), 2019
U. Sulubacak
Ozan Caglayan
Stig-Arne Gronroos
Aku Rouhe
Desmond Elliott
Lucia Specia
Jörg Tiedemann
188
85
0
28 Nov 2019
Non-Autoregressive Coarse-to-Fine Video Captioning
Bang-ju Yang
Yuexian Zou
Fenglin Liu
Can Zhang
352
11
0
27 Nov 2019
Efficient Attention Mechanism for Visual Dialog that can Handle All the Interactions between Multiple Inputs
Van-Quang Nguyen
Masanori Suganuma
Takayuki Okatani
223
7
0
26 Nov 2019
Text2FaceGAN: Face Generation from Fine Grained Textual Descriptions
IEEE International Conference on Multimedia Big Data (ICMBD), 2019
Osaid Rehman Nasir
S. Jha
Manraj Singh Grover
Yi Yu
Ajit Kumar
R. Shah
CVBM
GAN
94
46
0
26 Nov 2019
Characterizing the impact of using features extracted from pre-trained models on the quality of video captioning sequence-to-sequence models
International Conferences on Pattern Recognition and Artificial Intelligence (ICCPRAI), 2019
Menatallh Hammad
May Hammad
Mohamed Elshenawy
81
2
0
22 Nov 2019
Continual adaptation for efficient machine communication
Conference on Computational Natural Language Learning (CoNLL), 2019
Robert D. Hawkins
Minae Kwon
Dorsa Sadigh
Noah D. Goodman
CLL
178
37
0
22 Nov 2019
Empirical Autopsy of Deep Video Captioning Frameworks
Nayyer Aafaq
Naveed Akhtar
Wei Liu
Lin Wang
115
6
0
21 Nov 2019
Ladder Loss for Coherent Visual-Semantic Embedding
AAAI Conference on Artificial Intelligence (AAAI), 2019
Mo Zhou
Zhenxing Niu
Le Wang
Zhanning Gao
Qilin Zhang
G. Hua
245
44
0
18 Nov 2019
On Architectures for Including Visual Information in Neural Language Models for Image Description
Marc Tanti
Albert Gatt
K. Camilleri
VLM
100
2
0
09 Nov 2019
Bootstrapping Disjoint Datasets for Multilingual Multimodal Representation Learning
Ákos Kádár
Grzegorz Chrupała
Afra Alishahi
Desmond Elliott
168
1
0
09 Nov 2019
Contextual Grounding of Natural Language Entities in Images
Farley Lai
Ning Xie
Derek Doran
Asim Kadav
ObjD
83
6
0
05 Nov 2019
Sequence Modeling with Unconstrained Generation Order
Neural Information Processing Systems (NeurIPS), 2019
Dmitrii Emelianenko
Elena Voita
P. Serdyukov
217
18
0
01 Nov 2019
Hidden State Guidance: Improving Image Captioning using An Image Conditioned Autoencoder
Jialin Wu
Raymond J. Mooney
116
0
0
31 Oct 2019
PLATO: Pre-trained Dialogue Generation Model with Discrete Latent Variable
Annual Meeting of the Association for Computational Linguistics (ACL), 2019
Siqi Bao
H. He
Fan Wang
Hua Wu
Haifeng Wang
204
282
0
17 Oct 2019
Vatex Video Captioning Challenge 2020: Multi-View Features and Hybrid Reward Strategies for Video Captioning
Xinxin Zhu
A. Gorban
V. A. Makarov
Shichen Lu
I. Tyukin
Hanqing Lu
168
2
0
17 Oct 2019
Improving Question Generation With to the Point Context
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Jingjing Li
Yifan Gao
Lidong Bing
Irwin King
Michael R. Lyu
LRM
143
50
0
14 Oct 2019
Semantic-aware Image Deblurring
Fuhai Chen
Rongrong Ji
Chengpeng Dai
Xiaoshuai Sun
Chia-Wen Lin
Jiayi Ji
Baochang Zhang
Feiyue Huang
Liujuan Cao
BDL
VLM
151
7
0
09 Oct 2019
Unified Vision-Language Pre-Training for Image Captioning and VQA
AAAI Conference on Artificial Intelligence (AAAI), 2019
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
597
1,001
0
24 Sep 2019
Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings
Conference on Natural Language Processing (NLP), 2019
Gregor Wiedemann
Steffen Remus
Avi Chawla
Chris Biemann
233
190
0
23 Sep 2019
Improving CNN-based Planar Object Detection with Geometric Prior Knowledge
IEEE International Symposium on Safety, Security and Rescue Robotics (SSRR), 2019
Jianxiong Cai
Jiawei Hou
Yiren Lu
Hongyu Chen
L. Kneip
Sören Schwertfeger
107
7
0
23 Sep 2019
Large-scale representation learning from visually grounded untranscribed speech
Conference on Computational Natural Language Learning (CoNLL), 2019
Gabriel Ilharco
Yuan Zhang
Jason Baldridge
SSL
123
63
0
19 Sep 2019
ContCap: A scalable framework for continual image captioning
Giang Nguyen
Tae Joon Jun
T. Tran
Tolcha Yalew
Daeyoung Kim
VLM
CLL
117
12
0
19 Sep 2019
Inverse Visual Question Answering with Multi-Level Attentions
Asian Conference on Machine Learning (ACML), 2019
Yaser Alwatter
Yuhong Guo
BDL
119
1
0
17 Sep 2019
Bridging Visual Perception with Contextual Semantics for Understanding Robot Manipulation Tasks
Chen Jiang
Martin Jägersand
163
4
0
16 Sep 2019
Compositional Generalization in Image Captioning
Conference on Computational Natural Language Learning (CoNLL), 2019
Mitja Nikolaus
Mostafa Abdou
Matthew Lamm
Rahul Aralikatte
Desmond Elliott
CoGe
223
49
0
10 Sep 2019
Stack-VS: Stacked Visual-Semantic Attention for Image Caption Generation
IEEE Access (IEEE Access), 2019
Wei Wei
Ling Cheng
Xian-Ling Mao
Guangyou Zhou
Feida Zhu
DiffM
147
24
0
05 Sep 2019
Reflective Decoding Network for Image Captioning
IEEE International Conference on Computer Vision (ICCV), 2019
Lei Ke
Wenjie Pei
Ruiyu Li
Xiaoyong Shen
Yu-Wing Tai
ObjD
158
104
0
30 Aug 2019
Controllable Video Captioning with POS Sequence Guidance Based on Gated Fusion Network
IEEE International Conference on Computer Vision (ICCV), 2019
Bairui Wang
Lin Ma
Wei Zhang
Wenhao Jiang
Jingwen Wang
Wei Liu
206
176
0
27 Aug 2019
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
International Conference on Learning Representations (ICLR), 2019
Weijie Su
Xizhou Zhu
Yue Cao
Bin Li
Lewei Lu
Furu Wei
Jifeng Dai
VLM
MLLM
SSL
545
1,784
0
22 Aug 2019
ViCo: Word Embeddings from Visual Co-occurrences
IEEE International Conference on Computer Vision (ICCV), 2019
Tanmay Gupta
Alex Schwing
Derek Hoiem
119
25
0
22 Aug 2019
Phrase Localization Without Paired Training Examples
IEEE International Conference on Computer Vision (ICCV), 2019
Josiah Wang
Lucia Specia
102
49
0
20 Aug 2019
ARAML: A Stable Adversarial Training Framework for Text Generation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Pei Ke
Fei Huang
Shiyu Huang
Xiaoyan Zhu
GAN
114
24
0
20 Aug 2019
Unpaired Image-to-Speech Synthesis with Multimodal Information Bottleneck
IEEE International Conference on Computer Vision (ICCV), 2019
Shuang Ma
Daniel J. McDuff
Yale Song
162
28
0
19 Aug 2019
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training
AAAI Conference on Artificial Intelligence (AAAI), 2019
Gen Li
Nan Duan
Yuejian Fang
Ming Gong
Daxin Jiang
Ming Zhou
SSL
VLM
MLLM
596
941
0
16 Aug 2019
VisualBERT: A Simple and Performant Baseline for Vision and Language
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
518
2,171
0
09 Aug 2019
Scene-based Factored Attention for Image Captioning
Chen Shen
Rongrong Ji
Fuhai Chen
Xiaoshuai Sun
Xiangming Li
119
0
0
07 Aug 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Neural Information Processing Systems (NeurIPS), 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
816
4,141
0
06 Aug 2019
Sound source detection, localization and classification using consecutive ensemble of CRNN models
Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2019
Slawomir Kapka
M. Lewandowski
205
72
0
02 Aug 2019
Convolutional Auto-encoding of Sentence Topics for Image Paragraph Generation
International Joint Conference on Artificial Intelligence (IJCAI), 2019
Jing Wang
Yingwei Pan
Ting Yao
Jinhui Tang
Tao Mei
VLM
BDL
DiffM
130
38
0
01 Aug 2019
V-PROM: A Benchmark for Visual Reasoning Using Visual Progressive Matrices
AAAI Conference on Artificial Intelligence (AAAI), 2019
Damien Teney
Peng Wang
Jiewei Cao
Lingqiao Liu
Chunhua Shen
Anton Van Den Hengel
121
35
0
29 Jul 2019
Multi-adversarial Faster-RCNN for Unrestricted Object Detection
IEEE International Conference on Computer Vision (ICCV), 2019
Zhenwei He
Lei Zhang
ObjD
245
344
0
24 Jul 2019
Previous
1
2
3
...
25
26
27
...
29
30
31
Next