ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1505.04870
  4. Cited By
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for
  Richer Image-to-Sentence Models
v1v2v3v4 (latest)

Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models

19 May 2015
Bryan A. Plummer
Liwei Wang
Christopher M. Cervantes
Juan C. Caicedo
Anjali Narayan-Chen
Svetlana Lazebnik
ArXiv (abs)PDFHTML

Papers citing "Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models"

50 / 1,325 papers shown
OptiBox: Breaking the Limits of Proposals for Visual Grounding
OptiBox: Breaking the Limits of Proposals for Visual Grounding
Zicong Fan
S. Meng
Leonid Sigal
James J. Little
ObjD
140
0
0
29 Nov 2019
Dividing and Conquering Cross-Modal Recipe Retrieval: from Nearest
  Neighbours Baselines to SoTA
Dividing and Conquering Cross-Modal Recipe Retrieval: from Nearest Neighbours Baselines to SoTA
Mikhail Fain
Niall Twomey
Andrey Ponikar
Ryan Fox
Danushka Bollegala
235
20
0
28 Nov 2019
Learning Cross-modal Context Graph for Visual Grounding
Learning Cross-modal Context Graph for Visual GroundingAAAI Conference on Artificial Intelligence (AAAI), 2019
Yongfei Liu
Bo Wan
Xiao-Dan Zhu
Xuming He
272
98
0
20 Nov 2019
Ladder Loss for Coherent Visual-Semantic Embedding
Ladder Loss for Coherent Visual-Semantic EmbeddingAAAI Conference on Artificial Intelligence (AAAI), 2019
Mo Zhou
Zhenxing Niu
Le Wang
Zhanning Gao
Qilin Zhang
G. Hua
282
45
0
18 Nov 2019
Multimodal Intelligence: Representation Learning, Information Fusion,
  and Applications
Multimodal Intelligence: Representation Learning, Information Fusion, and ApplicationsIEEE Journal on Selected Topics in Signal Processing (JSTSP), 2019
Chao Zhang
Zichao Yang
Xiaodong He
Li Deng
HAIAI4TS
325
408
0
10 Nov 2019
Drill-down: Interactive Retrieval of Complex Scenes using Natural
  Language Queries
Drill-down: Interactive Retrieval of Complex Scenes using Natural Language QueriesNeural Information Processing Systems (NeurIPS), 2019
Fuwen Tan
Paola Cascante-Bonilla
Xiaoxiao Guo
Hui Wu
Song Feng
Vicente Ordonez
166
33
0
10 Nov 2019
Contextual Grounding of Natural Language Entities in Images
Contextual Grounding of Natural Language Entities in Images
Farley Lai
Ning Xie
Derek Doran
Asim Kadav
ObjD
107
6
0
05 Nov 2019
Leveraging Auxiliary Text for Deep Recognition of Unseen Visual
  Relationships
Leveraging Auxiliary Text for Deep Recognition of Unseen Visual RelationshipsInternational Conference on Learning Representations (ICLR), 2019
G. S. Kenigsfield
Ran El-Yaniv
123
2
0
27 Oct 2019
REMIND Your Neural Network to Prevent Catastrophic Forgetting
REMIND Your Neural Network to Prevent Catastrophic ForgettingEuropean Conference on Computer Vision (ECCV), 2019
Tyler L. Hayes
Kushal Kafle
Robik Shrestha
Manoj Acharya
Christopher Kanan
CLL
441
330
0
06 Oct 2019
UNITER: UNiversal Image-TExt Representation Learning
UNITER: UNiversal Image-TExt Representation LearningEuropean Conference on Computer Vision (ECCV), 2019
Yen-Chun Chen
Linjie Li
Licheng Yu
Ahmed El Kholy
Faisal Ahmed
Zhe Gan
Yu Cheng
Jingjing Liu
VLMOT
374
465
0
25 Sep 2019
Visuallly Grounded Generation of Entailments from Premises
Visuallly Grounded Generation of Entailments from PremisesInternational Conference on Natural Language Generation (INLG), 2019
Somayeh Jafaritazehjani
Albert Gatt
Marc Tanti
LRM
123
1
0
21 Sep 2019
ContCap: A scalable framework for continual image captioning
ContCap: A scalable framework for continual image captioning
Giang Nguyen
Tae Joon Jun
T. Tran
Tolcha Yalew
Daeyoung Kim
VLMCLL
118
13
0
19 Sep 2019
Joint Wasserstein Autoencoders for Aligning Multimodal Embeddings
Joint Wasserstein Autoencoders for Aligning Multimodal Embeddings
Shweta Mahajan
Teresa Botschen
Iryna Gurevych
Stefan Roth
107
8
0
14 Sep 2019
MULE: Multimodal Universal Language Embedding
MULE: Multimodal Universal Language EmbeddingAAAI Conference on Artificial Intelligence (AAAI), 2019
Donghyun Kim
Kuniaki Saito
Kate Saenko
Stan Sclaroff
Bryan A. Plummer
VLM
198
43
0
08 Sep 2019
Do Cross Modal Systems Leverage Semantic Relationships?
Do Cross Modal Systems Leverage Semantic Relationships?
Shah Nawaz
Muhammad Kamran Janjua
I. Gallo
Arif Mahmood
Alessandro Calefati
Faisal Shafait
115
9
0
03 Sep 2019
Phrase Grounding by Soft-Label Chain Conditional Random Field
Phrase Grounding by Soft-Label Chain Conditional Random FieldConference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Hamish Ivison
Anjali Narayan-Chen
120
10
0
01 Sep 2019
Aesthetic Image Captioning From Weakly-Labelled Photographs
Aesthetic Image Captioning From Weakly-Labelled Photographs
Koustav Ghosal
A. Rana
A. Smolic
198
29
0
29 Aug 2019
Probing Representations Learned by Multimodal Recurrent and Transformer
  Models
Probing Representations Learned by Multimodal Recurrent and Transformer Models
Jindrich Libovický
Pranava Madhyastha
135
1
0
29 Aug 2019
Adversarial Representation Learning for Text-to-Image Matching
Adversarial Representation Learning for Text-to-Image MatchingIEEE International Conference on Computer Vision (ICCV), 2019
N. Sarafianos
Xiang Xu
I. Kakadiaris
GAN
268
217
0
28 Aug 2019
Towards Unsupervised Image Captioning with Shared Multimodal Embeddings
Towards Unsupervised Image Captioning with Shared Multimodal EmbeddingsIEEE International Conference on Computer Vision (ICCV), 2019
Iro Laina
Christian Rupprecht
Nassir Navab
SSL
186
112
0
25 Aug 2019
Phrase Localization Without Paired Training Examples
Phrase Localization Without Paired Training ExamplesIEEE International Conference on Computer Vision (ICCV), 2019
Josiah Wang
Lucia Specia
129
49
0
20 Aug 2019
Zero-Shot Grounding of Objects from Natural Language Queries
Zero-Shot Grounding of Objects from Natural Language QueriesIEEE International Conference on Computer Vision (ICCV), 2019
Arka Sadhu
Kan Chen
Ram Nevatia
ObjD
250
173
0
20 Aug 2019
A Fast and Accurate One-Stage Approach to Visual Grounding
A Fast and Accurate One-Stage Approach to Visual GroundingIEEE International Conference on Computer Vision (ICCV), 2019
Zhengyuan Yang
Boqing Gong
Liwei Wang
Wenbing Huang
Dong Yu
Jiebo Luo
ObjD
270
428
0
18 Aug 2019
Language Features Matter: Effective Language Representations for
  Vision-Language Tasks
Language Features Matter: Effective Language Representations for Vision-Language TasksIEEE International Conference on Computer Vision (ICCV), 2019
Andrea Burns
Reuben Tan
Kate Saenko
Stan Sclaroff
Bryan A. Plummer
VLM
165
28
0
17 Aug 2019
VisualBERT: A Simple and Performant Baseline for Vision and Language
VisualBERT: A Simple and Performant Baseline for Vision and Language
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
613
2,211
0
09 Aug 2019
Semi Supervised Phrase Localization in a Bidirectional Caption-Image
  Retrieval Framework
Semi Supervised Phrase Localization in a Bidirectional Caption-Image Retrieval Framework
Deepan Das
Noor Mohammed Ghouse
Shashank Verma
Yin Li
120
0
0
08 Aug 2019
Trends in Integration of Vision and Language Research: A Survey of
  Tasks, Datasets, and Methods
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and MethodsJournal of Artificial Intelligence Research (JAIR), 2019
Aditya Mogadala
M. Kalimuthu
Dietrich Klakow
VLM
416
143
0
22 Jul 2019
Hindi Visual Genome: A Dataset for Multimodal English-to-Hindi Machine
  Translation
Hindi Visual Genome: A Dataset for Multimodal English-to-Hindi Machine TranslationJournal of Computacion y Sistemas (JCYS), 2019
Shantipriya Parida
Ondrej Bojar
S. Dash
180
67
0
21 Jul 2019
Variational Context: Exploiting Visual and Textual Context for Grounding
  Referring Expressions
Variational Context: Exploiting Visual and Textual Context for Grounding Referring ExpressionsIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2019
Yulei Niu
Hanwang Zhang
Zhiwu Lu
Shih-Fu Chang
ObjDBDL
171
31
0
08 Jul 2019
Distilling Translations with Visual Awareness
Distilling Translations with Visual AwarenessAnnual Meeting of the Association for Computational Linguistics (ACL), 2019
Julia Ive
Pranava Madhyastha
Lucia Specia
VLM
263
85
0
18 Jun 2019
Expressing Visual Relationships via Language
Expressing Visual Relationships via LanguageAnnual Meeting of the Association for Computational Linguistics (ACL), 2019
Hao Tan
Franck Dernoncourt
Zhe Lin
Trung Bui
Joey Tianyi Zhou
242
78
0
18 Jun 2019
Weakly-Supervised Spatio-Temporally Grounding Natural Sentence in Video
Weakly-Supervised Spatio-Temporally Grounding Natural Sentence in VideoAnnual Meeting of the Association for Computational Linguistics (ACL), 2019
Zhenfang Chen
Lin Ma
Tong Lu
Kwan-Yee K. Wong
272
111
0
06 Jun 2019
The PhotoBook Dataset: Building Common Ground through Visually-Grounded
  Dialogue
The PhotoBook Dataset: Building Common Ground through Visually-Grounded DialogueAnnual Meeting of the Association for Computational Linguistics (ACL), 2019
J. Haber
Tim Baumgärtner
Ece Takmaz
Lieke Gelderloos
Elia Bruni
Raquel Fernández
182
85
0
04 Jun 2019
Listening while Speaking and Visualizing: Improving ASR through
  Multimodal Chain
Listening while Speaking and Visualizing: Improving ASR through Multimodal ChainAutomatic Speech Recognition & Understanding (ASRU), 2019
Johanes Effendi
Andros Tjandra
S. Sakti
Satoshi Nakamura
161
4
0
03 Jun 2019
Stochastic Generalized Adversarial Label Learning
Stochastic Generalized Adversarial Label Learning
Chidubem Arachie
Bert Huang
NoLa
106
0
0
03 Jun 2019
Learning to Generate Grounded Visual Captions without Localization
  Supervision
Learning to Generate Grounded Visual Captions without Localization Supervision
Chih-Yao Ma
Yannis Kalantidis
Ghassan AlRegib
Peter Vajda
Marcus Rohrbach
Z. Kira
SSL
397
10
0
01 Jun 2019
Interactive-predictive neural multimodal systems
Interactive-predictive neural multimodal systemsIberian Conference on Pattern Recognition and Image Analysis (IbPRIA), 2019
Álvaro Peris
F. Casacuberta
KELMHAI
134
2
0
30 May 2019
Contextual Translation Embedding for Visual Relationship Detection and
  Scene Graph Generation
Contextual Translation Embedding for Visual Relationship Detection and Scene Graph Generation
Zih-Siou Hung
Arun Mallya
Svetlana Lazebnik
ViT
216
15
0
28 May 2019
Don't Blame Distributional Semantics if it can't do Entailment
Don't Blame Distributional Semantics if it can't do EntailmentInternational Conference on Computational Semantics (IWCS), 2019
M. Westera
Gemma Boleda
CoGe
148
21
0
17 May 2019
Deep Metric Learning Beyond Binary Supervision
Deep Metric Learning Beyond Binary Supervision
Sungyeon Kim
Minkyo Seo
Ivan Laptev
Minsu Cho
Suha Kwak
SSL
149
102
0
21 Apr 2019
Saliency-Guided Attention Network for Image-Sentence Matching
Saliency-Guided Attention Network for Image-Sentence Matching
Zhong Ji
Haoran Wang
Jiawei Han
Yanwei Pang
173
95
0
20 Apr 2019
Unsupervised Discovery of Multimodal Links in Multi-image,
  Multi-sentence Documents
Unsupervised Discovery of Multimodal Links in Multi-image, Multi-sentence Documents
Jack Hessel
Lillian Lee
David M. Mimno
162
31
0
16 Apr 2019
Natural Language Semantics With Pictures: Some Language & Vision
  Datasets and Potential Uses for Computational Semantics
Natural Language Semantics With Pictures: Some Language & Vision Datasets and Potential Uses for Computational Semantics
David Schlangen
122
6
0
15 Apr 2019
Referring to Objects in Videos using Spatio-Temporal Identifying
  Descriptions
Referring to Objects in Videos using Spatio-Temporal Identifying Descriptions
Peratham Wiriyathammabhum
Abhinav Shrivastava
Vlad I. Morariu
L. Davis
121
5
0
08 Apr 2019
Modularized Textual Grounding for Counterfactual Resilience
Modularized Textual Grounding for Counterfactual Resilience
Zhiyuan Fang
Shu Kong
Charless C. Fowlkes
Yezhou Yang
194
33
0
07 Apr 2019
VATEX: A Large-Scale, High-Quality Multilingual Dataset for
  Video-and-Language Research
VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research
Xin Eric Wang
Jiawei Wu
Junkun Chen
Lei Li
Yuan-fang Wang
William Yang Wang
506
648
0
06 Apr 2019
Good News, Everyone! Context driven entity-aware captioning for news
  images
Good News, Everyone! Context driven entity-aware captioning for news images
Ali Furkan Biten
Lluís Gómez
Marçal Rusiñol
Dimosthenis Karatzas
192
156
0
02 Apr 2019
Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption
  Alignment
Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment
Samyak Datta
Karan Sikka
Anirban Roy
Karuna Ahuja
Devi Parikh
Ajay Divakaran
205
112
0
27 Mar 2019
Probing the Need for Visual Context in Multimodal Machine Translation
Probing the Need for Visual Context in Multimodal Machine TranslationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2019
Ozan Caglayan
Pranava Madhyastha
Lucia Specia
Loïc Barrault
176
153
0
20 Mar 2019
Neural Sequential Phrase Grounding (SeqGROUND)
Neural Sequential Phrase Grounding (SeqGROUND)Computer Vision and Pattern Recognition (CVPR), 2019
Pelin Dogan
Leonid Sigal
Markus Gross
ObjD
217
54
0
18 Mar 2019
Previous
123...2324252627
Next
Page 24 of 27
Pageof 27