ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1505.04870
  4. Cited By
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for
  Richer Image-to-Sentence Models
v1v2v3v4 (latest)

Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models

19 May 2015
Bryan A. Plummer
Liwei Wang
Christopher M. Cervantes
Juan C. Caicedo
Anjali Narayan-Chen
Svetlana Lazebnik
ArXiv (abs)PDFHTML

Papers citing "Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models"

50 / 1,326 papers shown
Semi-supervised Visual Feature Integration for Pre-trained Language
  Models
Semi-supervised Visual Feature Integration for Pre-trained Language Models
Lisai Zhang
Qingcai Chen
Dongfang Li
Buzhou Tang
VLM
246
1
0
01 Dec 2019
OptiBox: Breaking the Limits of Proposals for Visual Grounding
OptiBox: Breaking the Limits of Proposals for Visual Grounding
Zicong Fan
S. Meng
Leonid Sigal
James J. Little
ObjD
178
0
0
29 Nov 2019
Dividing and Conquering Cross-Modal Recipe Retrieval: from Nearest
  Neighbours Baselines to SoTA
Dividing and Conquering Cross-Modal Recipe Retrieval: from Nearest Neighbours Baselines to SoTA
Mikhail Fain
Niall Twomey
Andrey Ponikar
Ryan Fox
Danushka Bollegala
250
20
0
28 Nov 2019
Learning Cross-modal Context Graph for Visual Grounding
Learning Cross-modal Context Graph for Visual GroundingAAAI Conference on Artificial Intelligence (AAAI), 2019
Yongfei Liu
Bo Wan
Xiao-Dan Zhu
Xuming He
281
99
0
20 Nov 2019
Ladder Loss for Coherent Visual-Semantic Embedding
Ladder Loss for Coherent Visual-Semantic EmbeddingAAAI Conference on Artificial Intelligence (AAAI), 2019
Mo Zhou
Zhenxing Niu
Le Wang
Zhanning Gao
Qilin Zhang
G. Hua
301
45
0
18 Nov 2019
Multimodal Intelligence: Representation Learning, Information Fusion,
  and Applications
Multimodal Intelligence: Representation Learning, Information Fusion, and ApplicationsIEEE Journal on Selected Topics in Signal Processing (JSTSP), 2019
Chao Zhang
Zichao Yang
Xiaodong He
Li Deng
HAIAI4TS
325
416
0
10 Nov 2019
Drill-down: Interactive Retrieval of Complex Scenes using Natural
  Language Queries
Drill-down: Interactive Retrieval of Complex Scenes using Natural Language QueriesNeural Information Processing Systems (NeurIPS), 2019
Fuwen Tan
Paola Cascante-Bonilla
Xiaoxiao Guo
Hui Wu
Song Feng
Vicente Ordonez
202
33
0
10 Nov 2019
Contextual Grounding of Natural Language Entities in Images
Contextual Grounding of Natural Language Entities in Images
Farley Lai
Ning Xie
Derek Doran
Asim Kadav
ObjD
147
6
0
05 Nov 2019
Leveraging Auxiliary Text for Deep Recognition of Unseen Visual
  Relationships
Leveraging Auxiliary Text for Deep Recognition of Unseen Visual RelationshipsInternational Conference on Learning Representations (ICLR), 2019
G. S. Kenigsfield
Ran El-Yaniv
125
2
0
27 Oct 2019
REMIND Your Neural Network to Prevent Catastrophic Forgetting
REMIND Your Neural Network to Prevent Catastrophic ForgettingEuropean Conference on Computer Vision (ECCV), 2019
Tyler L. Hayes
Kushal Kafle
Robik Shrestha
Manoj Acharya
Christopher Kanan
CLL
441
333
0
06 Oct 2019
UNITER: UNiversal Image-TExt Representation Learning
UNITER: UNiversal Image-TExt Representation LearningEuropean Conference on Computer Vision (ECCV), 2019
Yen-Chun Chen
Linjie Li
Licheng Yu
Ahmed El Kholy
Faisal Ahmed
Zhe Gan
Yu Cheng
Jingjing Liu
VLMOT
399
469
0
25 Sep 2019
Visuallly Grounded Generation of Entailments from Premises
Visuallly Grounded Generation of Entailments from PremisesInternational Conference on Natural Language Generation (INLG), 2019
Somayeh Jafaritazehjani
Albert Gatt
Marc Tanti
LRM
124
1
0
21 Sep 2019
ContCap: A scalable framework for continual image captioning
ContCap: A scalable framework for continual image captioning
Giang Nguyen
Tae Joon Jun
T. Tran
Tolcha Yalew
Daeyoung Kim
VLMCLL
118
13
0
19 Sep 2019
Joint Wasserstein Autoencoders for Aligning Multimodal Embeddings
Joint Wasserstein Autoencoders for Aligning Multimodal Embeddings
Shweta Mahajan
Teresa Botschen
Iryna Gurevych
Stefan Roth
112
8
0
14 Sep 2019
MULE: Multimodal Universal Language Embedding
MULE: Multimodal Universal Language EmbeddingAAAI Conference on Artificial Intelligence (AAAI), 2019
Donghyun Kim
Kuniaki Saito
Kate Saenko
Stan Sclaroff
Bryan A. Plummer
VLM
210
43
0
08 Sep 2019
Do Cross Modal Systems Leverage Semantic Relationships?
Do Cross Modal Systems Leverage Semantic Relationships?
Shah Nawaz
Muhammad Kamran Janjua
I. Gallo
Arif Mahmood
Alessandro Calefati
Faisal Shafait
136
9
0
03 Sep 2019
Phrase Grounding by Soft-Label Chain Conditional Random Field
Phrase Grounding by Soft-Label Chain Conditional Random FieldConference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Hamish Ivison
Anjali Narayan-Chen
128
10
0
01 Sep 2019
Aesthetic Image Captioning From Weakly-Labelled Photographs
Aesthetic Image Captioning From Weakly-Labelled Photographs
Koustav Ghosal
A. Rana
A. Smolic
274
30
0
29 Aug 2019
Probing Representations Learned by Multimodal Recurrent and Transformer
  Models
Probing Representations Learned by Multimodal Recurrent and Transformer Models
Jindrich Libovický
Pranava Madhyastha
146
1
0
29 Aug 2019
Adversarial Representation Learning for Text-to-Image Matching
Adversarial Representation Learning for Text-to-Image MatchingIEEE International Conference on Computer Vision (ICCV), 2019
N. Sarafianos
Xiang Xu
I. Kakadiaris
GAN
278
221
0
28 Aug 2019
Towards Unsupervised Image Captioning with Shared Multimodal Embeddings
Towards Unsupervised Image Captioning with Shared Multimodal EmbeddingsIEEE International Conference on Computer Vision (ICCV), 2019
Iro Laina
Christian Rupprecht
Nassir Navab
SSL
221
112
0
25 Aug 2019
Phrase Localization Without Paired Training Examples
Phrase Localization Without Paired Training ExamplesIEEE International Conference on Computer Vision (ICCV), 2019
Josiah Wang
Lucia Specia
138
49
0
20 Aug 2019
Zero-Shot Grounding of Objects from Natural Language Queries
Zero-Shot Grounding of Objects from Natural Language QueriesIEEE International Conference on Computer Vision (ICCV), 2019
Arka Sadhu
Kan Chen
Ram Nevatia
ObjD
261
173
0
20 Aug 2019
A Fast and Accurate One-Stage Approach to Visual Grounding
A Fast and Accurate One-Stage Approach to Visual GroundingIEEE International Conference on Computer Vision (ICCV), 2019
Zhengyuan Yang
Boqing Gong
Liwei Wang
Wenbing Huang
Dong Yu
Jiebo Luo
ObjD
305
432
0
18 Aug 2019
Language Features Matter: Effective Language Representations for
  Vision-Language Tasks
Language Features Matter: Effective Language Representations for Vision-Language TasksIEEE International Conference on Computer Vision (ICCV), 2019
Andrea Burns
Reuben Tan
Kate Saenko
Stan Sclaroff
Bryan A. Plummer
VLM
192
28
0
17 Aug 2019
VisualBERT: A Simple and Performant Baseline for Vision and Language
VisualBERT: A Simple and Performant Baseline for Vision and Language
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
745
2,227
0
09 Aug 2019
Semi Supervised Phrase Localization in a Bidirectional Caption-Image
  Retrieval Framework
Semi Supervised Phrase Localization in a Bidirectional Caption-Image Retrieval Framework
Deepan Das
Noor Mohammed Ghouse
Shashank Verma
Yin Li
123
0
0
08 Aug 2019
Trends in Integration of Vision and Language Research: A Survey of
  Tasks, Datasets, and Methods
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and MethodsJournal of Artificial Intelligence Research (JAIR), 2019
Aditya Mogadala
M. Kalimuthu
Dietrich Klakow
VLM
419
143
0
22 Jul 2019
Hindi Visual Genome: A Dataset for Multimodal English-to-Hindi Machine
  Translation
Hindi Visual Genome: A Dataset for Multimodal English-to-Hindi Machine TranslationJournal of Computacion y Sistemas (JCYS), 2019
Shantipriya Parida
Ondrej Bojar
S. Dash
192
66
0
21 Jul 2019
Variational Context: Exploiting Visual and Textual Context for Grounding
  Referring Expressions
Variational Context: Exploiting Visual and Textual Context for Grounding Referring ExpressionsIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2019
Yulei Niu
Hanwang Zhang
Zhiwu Lu
Shih-Fu Chang
ObjDBDL
171
31
0
08 Jul 2019
Distilling Translations with Visual Awareness
Distilling Translations with Visual AwarenessAnnual Meeting of the Association for Computational Linguistics (ACL), 2019
Julia Ive
Pranava Madhyastha
Lucia Specia
VLM
270
85
0
18 Jun 2019
Expressing Visual Relationships via Language
Expressing Visual Relationships via LanguageAnnual Meeting of the Association for Computational Linguistics (ACL), 2019
Hao Tan
Franck Dernoncourt
Zhe Lin
Trung Bui
Joey Tianyi Zhou
246
81
0
18 Jun 2019
Weakly-Supervised Spatio-Temporally Grounding Natural Sentence in Video
Weakly-Supervised Spatio-Temporally Grounding Natural Sentence in VideoAnnual Meeting of the Association for Computational Linguistics (ACL), 2019
Zhenfang Chen
Lin Ma
Tong Lu
Kwan-Yee K. Wong
306
111
0
06 Jun 2019
The PhotoBook Dataset: Building Common Ground through Visually-Grounded
  Dialogue
The PhotoBook Dataset: Building Common Ground through Visually-Grounded DialogueAnnual Meeting of the Association for Computational Linguistics (ACL), 2019
J. Haber
Tim Baumgärtner
Ece Takmaz
Lieke Gelderloos
Elia Bruni
Raquel Fernández
186
85
0
04 Jun 2019
Listening while Speaking and Visualizing: Improving ASR through
  Multimodal Chain
Listening while Speaking and Visualizing: Improving ASR through Multimodal ChainAutomatic Speech Recognition & Understanding (ASRU), 2019
Johanes Effendi
Andros Tjandra
S. Sakti
Satoshi Nakamura
162
4
0
03 Jun 2019
Stochastic Generalized Adversarial Label Learning
Stochastic Generalized Adversarial Label Learning
Chidubem Arachie
Bert Huang
NoLa
109
0
0
03 Jun 2019
Learning to Generate Grounded Visual Captions without Localization
  Supervision
Learning to Generate Grounded Visual Captions without Localization Supervision
Chih-Yao Ma
Yannis Kalantidis
Ghassan AlRegib
Peter Vajda
Marcus Rohrbach
Z. Kira
SSL
474
10
0
01 Jun 2019
Interactive-predictive neural multimodal systems
Interactive-predictive neural multimodal systemsIberian Conference on Pattern Recognition and Image Analysis (IbPRIA), 2019
Álvaro Peris
F. Casacuberta
KELMHAI
201
2
0
30 May 2019
Contextual Translation Embedding for Visual Relationship Detection and
  Scene Graph Generation
Contextual Translation Embedding for Visual Relationship Detection and Scene Graph Generation
Zih-Siou Hung
Arun Mallya
Svetlana Lazebnik
ViT
233
15
0
28 May 2019
Don't Blame Distributional Semantics if it can't do Entailment
Don't Blame Distributional Semantics if it can't do EntailmentInternational Conference on Computational Semantics (IWCS), 2019
M. Westera
Gemma Boleda
CoGe
166
21
0
17 May 2019
Deep Metric Learning Beyond Binary Supervision
Deep Metric Learning Beyond Binary Supervision
Sungyeon Kim
Minkyo Seo
Ivan Laptev
Minsu Cho
Suha Kwak
SSL
154
103
0
21 Apr 2019
Saliency-Guided Attention Network for Image-Sentence Matching
Saliency-Guided Attention Network for Image-Sentence Matching
Zhong Ji
Haoran Wang
Jiawei Han
Yanwei Pang
250
95
0
20 Apr 2019
Unsupervised Discovery of Multimodal Links in Multi-image,
  Multi-sentence Documents
Unsupervised Discovery of Multimodal Links in Multi-image, Multi-sentence Documents
Jack Hessel
Lillian Lee
David M. Mimno
171
31
0
16 Apr 2019
Natural Language Semantics With Pictures: Some Language & Vision
  Datasets and Potential Uses for Computational Semantics
Natural Language Semantics With Pictures: Some Language & Vision Datasets and Potential Uses for Computational Semantics
David Schlangen
155
6
0
15 Apr 2019
Referring to Objects in Videos using Spatio-Temporal Identifying
  Descriptions
Referring to Objects in Videos using Spatio-Temporal Identifying Descriptions
Peratham Wiriyathammabhum
Abhinav Shrivastava
Vlad I. Morariu
L. Davis
155
5
0
08 Apr 2019
Modularized Textual Grounding for Counterfactual Resilience
Modularized Textual Grounding for Counterfactual Resilience
Zhiyuan Fang
Shu Kong
Charless C. Fowlkes
Yezhou Yang
212
33
0
07 Apr 2019
VATEX: A Large-Scale, High-Quality Multilingual Dataset for
  Video-and-Language Research
VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research
Xin Eric Wang
Jiawei Wu
Junkun Chen
Lei Li
Yuan-fang Wang
William Yang Wang
536
654
0
06 Apr 2019
Good News, Everyone! Context driven entity-aware captioning for news
  images
Good News, Everyone! Context driven entity-aware captioning for news images
Ali Furkan Biten
Lluís Gómez
Marçal Rusiñol
Dimosthenis Karatzas
197
156
0
02 Apr 2019
Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption
  Alignment
Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment
Samyak Datta
Karan Sikka
Anirban Roy
Karuna Ahuja
Devi Parikh
Ajay Divakaran
211
112
0
27 Mar 2019
Probing the Need for Visual Context in Multimodal Machine Translation
Probing the Need for Visual Context in Multimodal Machine TranslationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2019
Ozan Caglayan
Pranava Madhyastha
Lucia Specia
Loïc Barrault
217
154
0
20 Mar 2019
Previous
123...2324252627
Next
Page 24 of 27
Pageof 27