ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1504.00325
  4. Cited By
Microsoft COCO Captions: Data Collection and Evaluation Server
v1v2 (latest)

Microsoft COCO Captions: Data Collection and Evaluation Server

1 April 2015
Xinlei Chen
Hao Fang
Nayeon Lee
Ramakrishna Vedantam
Saurabh Gupta
Piotr Dollar
C. L. Zitnick
ArXiv (abs)PDFHTML

Papers citing "Microsoft COCO Captions: Data Collection and Evaluation Server"

50 / 1,515 papers shown
Title
In Defense of Grid Features for Visual Question Answering
In Defense of Grid Features for Visual Question AnsweringComputer Vision and Pattern Recognition (CVPR), 2020
Huaizu Jiang
Ishan Misra
Marcus Rohrbach
Erik Learned-Miller
Xinlei Chen
OODObjD
287
350
0
10 Jan 2020
All-in-One Image-Grounded Conversational Agents
All-in-One Image-Grounded Conversational Agents
Da Ju
Kurt Shuster
Y-Lan Boureau
Jason Weston
LLMAG
133
9
0
28 Dec 2019
DDI-100: Dataset for Text Detection and Recognition
DDI-100: Dataset for Text Detection and Recognition
I. Zharikov
Filipp Nikitin
I. Vasiliev
V. Dokholyan
160
17
0
25 Dec 2019
Explicit Sparse Transformer: Concentrated Attention Through Explicit
  Selection
Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection
Guangxiang Zhao
Junyang Lin
Zhiyuan Zhang
Xuancheng Ren
Qi Su
Xu Sun
138
133
0
25 Dec 2019
Multimodal Generative Models for Compositional Representation Learning
Multimodal Generative Models for Compositional Representation Learning
Mike Wu
Noah D. Goodman
GANDRL
157
20
0
11 Dec 2019
Controlling Style and Semantics in Weakly-Supervised Image Generation
Controlling Style and Semantics in Weakly-Supervised Image GenerationEuropean Conference on Computer Vision (ECCV), 2019
Dario Pavllo
Aurelien Lucchi
Thomas Hofmann
170
35
0
06 Dec 2019
Connecting Vision and Language with Localized Narratives
Connecting Vision and Language with Localized NarrativesEuropean Conference on Computer Vision (ECCV), 2019
Jordi Pont-Tuset
J. Uijlings
Soravit Changpinyo
Radu Soricut
V. Ferrari
ObjD
402
283
0
06 Dec 2019
12-in-1: Multi-Task Vision and Language Representation Learning
12-in-1: Multi-Task Vision and Language Representation LearningComputer Vision and Pattern Recognition (CVPR), 2019
Jiasen Lu
Vedanuj Goswami
Marcus Rohrbach
Devi Parikh
Stefan Lee
VLMObjD
271
498
0
05 Dec 2019
Exposing and Correcting the Gender Bias in Image Captioning Datasets and
  Models
Exposing and Correcting the Gender Bias in Image Captioning Datasets and Models
Shruti Bhargava
David A. Forsyth
FaML
140
55
0
02 Dec 2019
Multimodal Machine Translation through Visuals and Speech
Multimodal Machine Translation through Visuals and SpeechMachine Translation (MT), 2019
U. Sulubacak
Ozan Caglayan
Stig-Arne Gronroos
Aku Rouhe
Desmond Elliott
Lucia Specia
Jörg Tiedemann
188
85
0
28 Nov 2019
Non-Autoregressive Coarse-to-Fine Video Captioning
Non-Autoregressive Coarse-to-Fine Video Captioning
Bang-ju Yang
Yuexian Zou
Fenglin Liu
Can Zhang
352
11
0
27 Nov 2019
Efficient Attention Mechanism for Visual Dialog that can Handle All the
  Interactions between Multiple Inputs
Efficient Attention Mechanism for Visual Dialog that can Handle All the Interactions between Multiple Inputs
Van-Quang Nguyen
Masanori Suganuma
Takayuki Okatani
223
7
0
26 Nov 2019
Text2FaceGAN: Face Generation from Fine Grained Textual Descriptions
Text2FaceGAN: Face Generation from Fine Grained Textual DescriptionsIEEE International Conference on Multimedia Big Data (ICMBD), 2019
Osaid Rehman Nasir
S. Jha
Manraj Singh Grover
Yi Yu
Ajit Kumar
R. Shah
CVBMGAN
94
46
0
26 Nov 2019
Characterizing the impact of using features extracted from pre-trained
  models on the quality of video captioning sequence-to-sequence models
Characterizing the impact of using features extracted from pre-trained models on the quality of video captioning sequence-to-sequence modelsInternational Conferences on Pattern Recognition and Artificial Intelligence (ICCPRAI), 2019
Menatallh Hammad
May Hammad
Mohamed Elshenawy
81
2
0
22 Nov 2019
Continual adaptation for efficient machine communication
Continual adaptation for efficient machine communicationConference on Computational Natural Language Learning (CoNLL), 2019
Robert D. Hawkins
Minae Kwon
Dorsa Sadigh
Noah D. Goodman
CLL
178
37
0
22 Nov 2019
Empirical Autopsy of Deep Video Captioning Frameworks
Empirical Autopsy of Deep Video Captioning Frameworks
Nayyer Aafaq
Naveed Akhtar
Wei Liu
Lin Wang
115
6
0
21 Nov 2019
Ladder Loss for Coherent Visual-Semantic Embedding
Ladder Loss for Coherent Visual-Semantic EmbeddingAAAI Conference on Artificial Intelligence (AAAI), 2019
Mo Zhou
Zhenxing Niu
Le Wang
Zhanning Gao
Qilin Zhang
G. Hua
245
44
0
18 Nov 2019
On Architectures for Including Visual Information in Neural Language
  Models for Image Description
On Architectures for Including Visual Information in Neural Language Models for Image Description
Marc Tanti
Albert Gatt
K. Camilleri
VLM
100
2
0
09 Nov 2019
Bootstrapping Disjoint Datasets for Multilingual Multimodal
  Representation Learning
Bootstrapping Disjoint Datasets for Multilingual Multimodal Representation Learning
Ákos Kádár
Grzegorz Chrupała
Afra Alishahi
Desmond Elliott
168
1
0
09 Nov 2019
Contextual Grounding of Natural Language Entities in Images
Contextual Grounding of Natural Language Entities in Images
Farley Lai
Ning Xie
Derek Doran
Asim Kadav
ObjD
83
6
0
05 Nov 2019
Sequence Modeling with Unconstrained Generation Order
Sequence Modeling with Unconstrained Generation OrderNeural Information Processing Systems (NeurIPS), 2019
Dmitrii Emelianenko
Elena Voita
P. Serdyukov
217
18
0
01 Nov 2019
Hidden State Guidance: Improving Image Captioning using An Image
  Conditioned Autoencoder
Hidden State Guidance: Improving Image Captioning using An Image Conditioned Autoencoder
Jialin Wu
Raymond J. Mooney
116
0
0
31 Oct 2019
PLATO: Pre-trained Dialogue Generation Model with Discrete Latent
  Variable
PLATO: Pre-trained Dialogue Generation Model with Discrete Latent VariableAnnual Meeting of the Association for Computational Linguistics (ACL), 2019
Siqi Bao
H. He
Fan Wang
Hua Wu
Haifeng Wang
204
282
0
17 Oct 2019
Vatex Video Captioning Challenge 2020: Multi-View Features and Hybrid
  Reward Strategies for Video Captioning
Vatex Video Captioning Challenge 2020: Multi-View Features and Hybrid Reward Strategies for Video Captioning
Xinxin Zhu
A. Gorban
V. A. Makarov
Shichen Lu
I. Tyukin
Hanqing Lu
168
2
0
17 Oct 2019
Improving Question Generation With to the Point Context
Improving Question Generation With to the Point ContextConference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Jingjing Li
Yifan Gao
Lidong Bing
Irwin King
Michael R. Lyu
LRM
143
50
0
14 Oct 2019
Semantic-aware Image Deblurring
Semantic-aware Image Deblurring
Fuhai Chen
Rongrong Ji
Chengpeng Dai
Xiaoshuai Sun
Chia-Wen Lin
Jiayi Ji
Baochang Zhang
Feiyue Huang
Liujuan Cao
BDLVLM
151
7
0
09 Oct 2019
Unified Vision-Language Pre-Training for Image Captioning and VQA
Unified Vision-Language Pre-Training for Image Captioning and VQAAAAI Conference on Artificial Intelligence (AAAI), 2019
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLMVLM
597
1,001
0
24 Sep 2019
Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with
  Contextualized Embeddings
Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized EmbeddingsConference on Natural Language Processing (NLP), 2019
Gregor Wiedemann
Steffen Remus
Avi Chawla
Chris Biemann
233
190
0
23 Sep 2019
Improving CNN-based Planar Object Detection with Geometric Prior
  Knowledge
Improving CNN-based Planar Object Detection with Geometric Prior KnowledgeIEEE International Symposium on Safety, Security and Rescue Robotics (SSRR), 2019
Jianxiong Cai
Jiawei Hou
Yiren Lu
Hongyu Chen
L. Kneip
Sören Schwertfeger
107
7
0
23 Sep 2019
Large-scale representation learning from visually grounded untranscribed
  speech
Large-scale representation learning from visually grounded untranscribed speechConference on Computational Natural Language Learning (CoNLL), 2019
Gabriel Ilharco
Yuan Zhang
Jason Baldridge
SSL
123
63
0
19 Sep 2019
ContCap: A scalable framework for continual image captioning
ContCap: A scalable framework for continual image captioning
Giang Nguyen
Tae Joon Jun
T. Tran
Tolcha Yalew
Daeyoung Kim
VLMCLL
117
12
0
19 Sep 2019
Inverse Visual Question Answering with Multi-Level Attentions
Inverse Visual Question Answering with Multi-Level AttentionsAsian Conference on Machine Learning (ACML), 2019
Yaser Alwatter
Yuhong Guo
BDL
119
1
0
17 Sep 2019
Bridging Visual Perception with Contextual Semantics for Understanding
  Robot Manipulation Tasks
Bridging Visual Perception with Contextual Semantics for Understanding Robot Manipulation Tasks
Chen Jiang
Martin Jägersand
163
4
0
16 Sep 2019
Compositional Generalization in Image Captioning
Compositional Generalization in Image CaptioningConference on Computational Natural Language Learning (CoNLL), 2019
Mitja Nikolaus
Mostafa Abdou
Matthew Lamm
Rahul Aralikatte
Desmond Elliott
CoGe
223
49
0
10 Sep 2019
Stack-VS: Stacked Visual-Semantic Attention for Image Caption Generation
Stack-VS: Stacked Visual-Semantic Attention for Image Caption GenerationIEEE Access (IEEE Access), 2019
Wei Wei
Ling Cheng
Xian-Ling Mao
Guangyou Zhou
Feida Zhu
DiffM
147
24
0
05 Sep 2019
Reflective Decoding Network for Image Captioning
Reflective Decoding Network for Image CaptioningIEEE International Conference on Computer Vision (ICCV), 2019
Lei Ke
Wenjie Pei
Ruiyu Li
Xiaoyong Shen
Yu-Wing Tai
ObjD
158
104
0
30 Aug 2019
Controllable Video Captioning with POS Sequence Guidance Based on Gated
  Fusion Network
Controllable Video Captioning with POS Sequence Guidance Based on Gated Fusion NetworkIEEE International Conference on Computer Vision (ICCV), 2019
Bairui Wang
Lin Ma
Wei Zhang
Wenhao Jiang
Jingwen Wang
Wei Liu
206
176
0
27 Aug 2019
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
VL-BERT: Pre-training of Generic Visual-Linguistic RepresentationsInternational Conference on Learning Representations (ICLR), 2019
Weijie Su
Xizhou Zhu
Yue Cao
Bin Li
Lewei Lu
Furu Wei
Jifeng Dai
VLMMLLMSSL
545
1,784
0
22 Aug 2019
ViCo: Word Embeddings from Visual Co-occurrences
ViCo: Word Embeddings from Visual Co-occurrencesIEEE International Conference on Computer Vision (ICCV), 2019
Tanmay Gupta
Alex Schwing
Derek Hoiem
119
25
0
22 Aug 2019
Phrase Localization Without Paired Training Examples
Phrase Localization Without Paired Training ExamplesIEEE International Conference on Computer Vision (ICCV), 2019
Josiah Wang
Lucia Specia
102
49
0
20 Aug 2019
ARAML: A Stable Adversarial Training Framework for Text Generation
ARAML: A Stable Adversarial Training Framework for Text GenerationConference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Pei Ke
Fei Huang
Shiyu Huang
Xiaoyan Zhu
GAN
114
24
0
20 Aug 2019
Unpaired Image-to-Speech Synthesis with Multimodal Information
  Bottleneck
Unpaired Image-to-Speech Synthesis with Multimodal Information BottleneckIEEE International Conference on Computer Vision (ICCV), 2019
Shuang Ma
Daniel J. McDuff
Yale Song
162
28
0
19 Aug 2019
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal
  Pre-training
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-trainingAAAI Conference on Artificial Intelligence (AAAI), 2019
Gen Li
Nan Duan
Yuejian Fang
Ming Gong
Daxin Jiang
Ming Zhou
SSLVLMMLLM
596
941
0
16 Aug 2019
VisualBERT: A Simple and Performant Baseline for Vision and Language
VisualBERT: A Simple and Performant Baseline for Vision and Language
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
518
2,171
0
09 Aug 2019
Scene-based Factored Attention for Image Captioning
Scene-based Factored Attention for Image Captioning
Chen Shen
Rongrong Ji
Fuhai Chen
Xiaoshuai Sun
Xiangming Li
119
0
0
07 Aug 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for
  Vision-and-Language Tasks
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language TasksNeural Information Processing Systems (NeurIPS), 2019
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSLVLM
816
4,141
0
06 Aug 2019
Sound source detection, localization and classification using
  consecutive ensemble of CRNN models
Sound source detection, localization and classification using consecutive ensemble of CRNN modelsWorkshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2019
Slawomir Kapka
M. Lewandowski
205
72
0
02 Aug 2019
Convolutional Auto-encoding of Sentence Topics for Image Paragraph
  Generation
Convolutional Auto-encoding of Sentence Topics for Image Paragraph GenerationInternational Joint Conference on Artificial Intelligence (IJCAI), 2019
Jing Wang
Yingwei Pan
Ting Yao
Jinhui Tang
Tao Mei
VLMBDLDiffM
130
38
0
01 Aug 2019
V-PROM: A Benchmark for Visual Reasoning Using Visual Progressive
  Matrices
V-PROM: A Benchmark for Visual Reasoning Using Visual Progressive MatricesAAAI Conference on Artificial Intelligence (AAAI), 2019
Damien Teney
Peng Wang
Jiewei Cao
Lingqiao Liu
Chunhua Shen
Anton Van Den Hengel
121
35
0
29 Jul 2019
Multi-adversarial Faster-RCNN for Unrestricted Object Detection
Multi-adversarial Faster-RCNN for Unrestricted Object DetectionIEEE International Conference on Computer Vision (ICCV), 2019
Zhenwei He
Lei Zhang
ObjD
245
344
0
24 Jul 2019
Previous
123...252627...293031
Next