Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1505.04870
Cited By
v1
v2
v3
v4 (latest)
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
19 May 2015
Bryan A. Plummer
Liwei Wang
Christopher M. Cervantes
Juan C. Caicedo
Anjali Narayan-Chen
Svetlana Lazebnik
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models"
50 / 1,325 papers shown
OptiBox: Breaking the Limits of Proposals for Visual Grounding
Zicong Fan
S. Meng
Leonid Sigal
James J. Little
ObjD
140
0
0
29 Nov 2019
Dividing and Conquering Cross-Modal Recipe Retrieval: from Nearest Neighbours Baselines to SoTA
Mikhail Fain
Niall Twomey
Andrey Ponikar
Ryan Fox
Danushka Bollegala
235
20
0
28 Nov 2019
Learning Cross-modal Context Graph for Visual Grounding
AAAI Conference on Artificial Intelligence (AAAI), 2019
Yongfei Liu
Bo Wan
Xiao-Dan Zhu
Xuming He
272
98
0
20 Nov 2019
Ladder Loss for Coherent Visual-Semantic Embedding
AAAI Conference on Artificial Intelligence (AAAI), 2019
Mo Zhou
Zhenxing Niu
Le Wang
Zhanning Gao
Qilin Zhang
G. Hua
282
45
0
18 Nov 2019
Multimodal Intelligence: Representation Learning, Information Fusion, and Applications
IEEE Journal on Selected Topics in Signal Processing (JSTSP), 2019
Chao Zhang
Zichao Yang
Xiaodong He
Li Deng
HAI
AI4TS
325
408
0
10 Nov 2019
Drill-down: Interactive Retrieval of Complex Scenes using Natural Language Queries
Neural Information Processing Systems (NeurIPS), 2019
Fuwen Tan
Paola Cascante-Bonilla
Xiaoxiao Guo
Hui Wu
Song Feng
Vicente Ordonez
166
33
0
10 Nov 2019
Contextual Grounding of Natural Language Entities in Images
Farley Lai
Ning Xie
Derek Doran
Asim Kadav
ObjD
107
6
0
05 Nov 2019
Leveraging Auxiliary Text for Deep Recognition of Unseen Visual Relationships
International Conference on Learning Representations (ICLR), 2019
G. S. Kenigsfield
Ran El-Yaniv
123
2
0
27 Oct 2019
REMIND Your Neural Network to Prevent Catastrophic Forgetting
European Conference on Computer Vision (ECCV), 2019
Tyler L. Hayes
Kushal Kafle
Robik Shrestha
Manoj Acharya
Christopher Kanan
CLL
441
330
0
06 Oct 2019
UNITER: UNiversal Image-TExt Representation Learning
European Conference on Computer Vision (ECCV), 2019
Yen-Chun Chen
Linjie Li
Licheng Yu
Ahmed El Kholy
Faisal Ahmed
Zhe Gan
Yu Cheng
Jingjing Liu
VLM
OT
374
465
0
25 Sep 2019
Visuallly Grounded Generation of Entailments from Premises
International Conference on Natural Language Generation (INLG), 2019
Somayeh Jafaritazehjani
Albert Gatt
Marc Tanti
LRM
123
1
0
21 Sep 2019
ContCap: A scalable framework for continual image captioning
Giang Nguyen
Tae Joon Jun
T. Tran
Tolcha Yalew
Daeyoung Kim
VLM
CLL
118
13
0
19 Sep 2019
Joint Wasserstein Autoencoders for Aligning Multimodal Embeddings
Shweta Mahajan
Teresa Botschen
Iryna Gurevych
Stefan Roth
107
8
0
14 Sep 2019
MULE: Multimodal Universal Language Embedding
AAAI Conference on Artificial Intelligence (AAAI), 2019
Donghyun Kim
Kuniaki Saito
Kate Saenko
Stan Sclaroff
Bryan A. Plummer
VLM
198
43
0
08 Sep 2019
Do Cross Modal Systems Leverage Semantic Relationships?
Shah Nawaz
Muhammad Kamran Janjua
I. Gallo
Arif Mahmood
Alessandro Calefati
Faisal Shafait
115
9
0
03 Sep 2019
Phrase Grounding by Soft-Label Chain Conditional Random Field
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Hamish Ivison
Anjali Narayan-Chen
120
10
0
01 Sep 2019
Aesthetic Image Captioning From Weakly-Labelled Photographs
Koustav Ghosal
A. Rana
A. Smolic
198
29
0
29 Aug 2019
Probing Representations Learned by Multimodal Recurrent and Transformer Models
Jindrich Libovický
Pranava Madhyastha
135
1
0
29 Aug 2019
Adversarial Representation Learning for Text-to-Image Matching
IEEE International Conference on Computer Vision (ICCV), 2019
N. Sarafianos
Xiang Xu
I. Kakadiaris
GAN
268
217
0
28 Aug 2019
Towards Unsupervised Image Captioning with Shared Multimodal Embeddings
IEEE International Conference on Computer Vision (ICCV), 2019
Iro Laina
Christian Rupprecht
Nassir Navab
SSL
186
112
0
25 Aug 2019
Phrase Localization Without Paired Training Examples
IEEE International Conference on Computer Vision (ICCV), 2019
Josiah Wang
Lucia Specia
129
49
0
20 Aug 2019
Zero-Shot Grounding of Objects from Natural Language Queries
IEEE International Conference on Computer Vision (ICCV), 2019
Arka Sadhu
Kan Chen
Ram Nevatia
ObjD
250
173
0
20 Aug 2019
A Fast and Accurate One-Stage Approach to Visual Grounding
IEEE International Conference on Computer Vision (ICCV), 2019
Zhengyuan Yang
Boqing Gong
Liwei Wang
Wenbing Huang
Dong Yu
Jiebo Luo
ObjD
270
428
0
18 Aug 2019
Language Features Matter: Effective Language Representations for Vision-Language Tasks
IEEE International Conference on Computer Vision (ICCV), 2019
Andrea Burns
Reuben Tan
Kate Saenko
Stan Sclaroff
Bryan A. Plummer
VLM
165
28
0
17 Aug 2019
VisualBERT: A Simple and Performant Baseline for Vision and Language
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
613
2,211
0
09 Aug 2019
Semi Supervised Phrase Localization in a Bidirectional Caption-Image Retrieval Framework
Deepan Das
Noor Mohammed Ghouse
Shashank Verma
Yin Li
120
0
0
08 Aug 2019
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods
Journal of Artificial Intelligence Research (JAIR), 2019
Aditya Mogadala
M. Kalimuthu
Dietrich Klakow
VLM
416
143
0
22 Jul 2019
Hindi Visual Genome: A Dataset for Multimodal English-to-Hindi Machine Translation
Journal of Computacion y Sistemas (JCYS), 2019
Shantipriya Parida
Ondrej Bojar
S. Dash
180
67
0
21 Jul 2019
Variational Context: Exploiting Visual and Textual Context for Grounding Referring Expressions
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2019
Yulei Niu
Hanwang Zhang
Zhiwu Lu
Shih-Fu Chang
ObjD
BDL
171
31
0
08 Jul 2019
Distilling Translations with Visual Awareness
Annual Meeting of the Association for Computational Linguistics (ACL), 2019
Julia Ive
Pranava Madhyastha
Lucia Specia
VLM
263
85
0
18 Jun 2019
Expressing Visual Relationships via Language
Annual Meeting of the Association for Computational Linguistics (ACL), 2019
Hao Tan
Franck Dernoncourt
Zhe Lin
Trung Bui
Joey Tianyi Zhou
242
78
0
18 Jun 2019
Weakly-Supervised Spatio-Temporally Grounding Natural Sentence in Video
Annual Meeting of the Association for Computational Linguistics (ACL), 2019
Zhenfang Chen
Lin Ma
Tong Lu
Kwan-Yee K. Wong
272
111
0
06 Jun 2019
The PhotoBook Dataset: Building Common Ground through Visually-Grounded Dialogue
Annual Meeting of the Association for Computational Linguistics (ACL), 2019
J. Haber
Tim Baumgärtner
Ece Takmaz
Lieke Gelderloos
Elia Bruni
Raquel Fernández
182
85
0
04 Jun 2019
Listening while Speaking and Visualizing: Improving ASR through Multimodal Chain
Automatic Speech Recognition & Understanding (ASRU), 2019
Johanes Effendi
Andros Tjandra
S. Sakti
Satoshi Nakamura
161
4
0
03 Jun 2019
Stochastic Generalized Adversarial Label Learning
Chidubem Arachie
Bert Huang
NoLa
106
0
0
03 Jun 2019
Learning to Generate Grounded Visual Captions without Localization Supervision
Chih-Yao Ma
Yannis Kalantidis
Ghassan AlRegib
Peter Vajda
Marcus Rohrbach
Z. Kira
SSL
397
10
0
01 Jun 2019
Interactive-predictive neural multimodal systems
Iberian Conference on Pattern Recognition and Image Analysis (IbPRIA), 2019
Álvaro Peris
F. Casacuberta
KELM
HAI
134
2
0
30 May 2019
Contextual Translation Embedding for Visual Relationship Detection and Scene Graph Generation
Zih-Siou Hung
Arun Mallya
Svetlana Lazebnik
ViT
216
15
0
28 May 2019
Don't Blame Distributional Semantics if it can't do Entailment
International Conference on Computational Semantics (IWCS), 2019
M. Westera
Gemma Boleda
CoGe
148
21
0
17 May 2019
Deep Metric Learning Beyond Binary Supervision
Sungyeon Kim
Minkyo Seo
Ivan Laptev
Minsu Cho
Suha Kwak
SSL
149
102
0
21 Apr 2019
Saliency-Guided Attention Network for Image-Sentence Matching
Zhong Ji
Haoran Wang
Jiawei Han
Yanwei Pang
173
95
0
20 Apr 2019
Unsupervised Discovery of Multimodal Links in Multi-image, Multi-sentence Documents
Jack Hessel
Lillian Lee
David M. Mimno
162
31
0
16 Apr 2019
Natural Language Semantics With Pictures: Some Language & Vision Datasets and Potential Uses for Computational Semantics
David Schlangen
122
6
0
15 Apr 2019
Referring to Objects in Videos using Spatio-Temporal Identifying Descriptions
Peratham Wiriyathammabhum
Abhinav Shrivastava
Vlad I. Morariu
L. Davis
121
5
0
08 Apr 2019
Modularized Textual Grounding for Counterfactual Resilience
Zhiyuan Fang
Shu Kong
Charless C. Fowlkes
Yezhou Yang
194
33
0
07 Apr 2019
VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research
Xin Eric Wang
Jiawei Wu
Junkun Chen
Lei Li
Yuan-fang Wang
William Yang Wang
506
648
0
06 Apr 2019
Good News, Everyone! Context driven entity-aware captioning for news images
Ali Furkan Biten
Lluís Gómez
Marçal Rusiñol
Dimosthenis Karatzas
192
156
0
02 Apr 2019
Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment
Samyak Datta
Karan Sikka
Anirban Roy
Karuna Ahuja
Devi Parikh
Ajay Divakaran
205
112
0
27 Mar 2019
Probing the Need for Visual Context in Multimodal Machine Translation
North American Chapter of the Association for Computational Linguistics (NAACL), 2019
Ozan Caglayan
Pranava Madhyastha
Lucia Specia
Loïc Barrault
176
153
0
20 Mar 2019
Neural Sequential Phrase Grounding (SeqGROUND)
Computer Vision and Pattern Recognition (CVPR), 2019
Pelin Dogan
Leonid Sigal
Markus Gross
ObjD
217
54
0
18 Mar 2019
Previous
1
2
3
...
23
24
25
26
27
Next
Page 24 of 27
Page
of 27
Go