ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1505.04870
  4. Cited By
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for
  Richer Image-to-Sentence Models
v1v2v3v4 (latest)

Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models

19 May 2015
Bryan A. Plummer
Liwei Wang
Christopher M. Cervantes
Juan C. Caicedo
Anjali Narayan-Chen
Svetlana Lazebnik
ArXiv (abs)PDFHTML

Papers citing "Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models"

50 / 1,325 papers shown
DR-BiLSTM: Dependent Reading Bidirectional LSTM for Natural Language
  Inference
DR-BiLSTM: Dependent Reading Bidirectional LSTM for Natural Language Inference
Reza Ghaeini
Sadid A. Hasan
Vivek Datla
Joey Liu
Kathy Lee
Ashequl Qadir
Yuan Ling
Aaditya (Adi) Prakash
Xiaoli Z. Fern
Oladimeji Farri
220
104
0
15 Feb 2018
TieNet: Text-Image Embedding Network for Common Thorax Disease
  Classification and Reporting in Chest X-rays
TieNet: Text-Image Embedding Network for Common Thorax Disease Classification and Reporting in Chest X-rays
Xiaosong Wang
Yifan Peng
Le Lu
Zhiyong Lu
Ronald M. Summers
MedIm
192
521
0
12 Jan 2018
Object Referring in Videos with Language and Human Gaze
Object Referring in Videos with Language and Human Gaze
A. Vasudevan
Dengxin Dai
Luc Van Gool
VOS
204
82
0
04 Jan 2018
Learning Semantic Concepts and Order for Image and Sentence Matching
Learning Semantic Concepts and Order for Image and Sentence Matching
Yan Huang
Qi Wu
Liang Wang
VLM
198
322
0
06 Dec 2017
Grounding Referring Expressions in Images by Variational Context
Grounding Referring Expressions in Images by Variational Context
Hanwang Zhang
Yulei Niu
Shih-Fu Chang
BDLObjD
268
237
0
05 Dec 2017
Discriminative Learning of Open-Vocabulary Object Retrieval and
  Localization by Negative Phrase Augmentation
Discriminative Learning of Open-Vocabulary Object Retrieval and Localization by Negative Phrase Augmentation
Ryota Hinami
Shiníchi Satoh
ObjD
123
24
0
27 Nov 2017
Conditional Image-Text Embedding Networks
Conditional Image-Text Embedding Networks
Bryan A. Plummer
Paige Kordas
M. Kiapour
Shuai Zheng
Robinson Piramuthu
Svetlana Lazebnik
358
124
0
22 Nov 2017
Excitation Backprop for RNNs
Excitation Backprop for RNNs
Sarah Adel Bargal
Andrea Zunino
Donghyun Kim
Jianming Zhang
Vittorio Murino
Stan Sclaroff
279
48
0
18 Nov 2017
Neural Motifs: Scene Graph Parsing with Global Context
Neural Motifs: Scene Graph Parsing with Global Context
Rowan Zellers
Mark Yatskar
Sam Thomson
Yejin Choi
GNN
300
1,098
0
17 Nov 2017
Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval
  with Generative Models
Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models
Jiuxiang Gu
Jianfei Cai
Shafiq Joty
Li Niu
G. Wang
VLM
312
381
0
17 Nov 2017
Parallel Attention: A Unified Framework for Visual Object Discovery
  through Dialogs and Queries
Parallel Attention: A Unified Framework for Visual Object Discovery through Dialogs and Queries
Bohan Zhuang
Qi Wu
Chunhua Shen
Ian Reid
Anton Van Den Hengel
ObjD
186
143
0
17 Nov 2017
Grad-CAM++: Improved Visual Explanations for Deep Convolutional Networks
Grad-CAM++: Improved Visual Explanations for Deep Convolutional Networks
Aditya Chattopadhyay
Anirban Sarkar
Prantik Howlader
V. Balasubramanian
FAtt
444
2,828
0
30 Oct 2017
Describing Natural Images Containing Novel Objects with Knowledge Guided
  Assitance
Describing Natural Images Containing Novel Objects with Knowledge Guided Assitance
Aditya Mogadala
Umanga Bista
Lexing Xie
Achim Rettinger
166
7
0
17 Oct 2017
Visual Reasoning with Natural Language
Visual Reasoning with Natural Language
Stephanie Zhou
Alane Suhr
Yoav Artzi
85
4
0
02 Oct 2017
Predicting Visual Features from Text for Image and Video Caption
  Retrieval
Predicting Visual Features from Text for Image and Video Caption Retrieval
Jianfeng Dong
Xirong Li
Cees G. M. Snoek
236
238
0
05 Sep 2017
Link the head to the "beak": Zero Shot Learning from Noisy Text
  Description at Part Precision
Link the head to the "beak": Zero Shot Learning from Noisy Text Description at Part Precision
Mohamed Elhoseiny
Yizhe Zhu
Han Zhang
Ahmed Elgammal
VLM
251
143
0
04 Sep 2017
VQS: Linking Segmentations to Questions and Answers for Supervised
  Attention in VQA and Question-Focused Semantic Segmentation
VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic SegmentationIEEE International Conference on Computer Vision (ICCV), 2017
Chuang Gan
Yandong Li
Haoxiang Li
Chen Sun
Boqing Gong
253
136
0
15 Aug 2017
Query-guided Regression Network with Context Policy for Phrase Grounding
Query-guided Regression Network with Context Policy for Phrase Grounding
Kan Chen
Rama Kovvuri
Ram Nevatia
168
145
0
04 Aug 2017
Localizing Moments in Video with Natural Language
Localizing Moments in Video with Natural Language
Lisa Anne Hendricks
Oliver Wang
Eli Shechtman
Josef Sivic
Trevor Darrell
Bryan C. Russell
410
1,111
0
04 Aug 2017
Discover and Learn New Objects from Documentaries
Discover and Learn New Objects from Documentaries
Kai-xiang Chen
Hang Song
Chen Change Loy
Dahua Lin
ObjD
160
20
0
30 Jul 2017
Weakly-supervised learning of visual relations
Weakly-supervised learning of visual relations
Julia Peyre
Ivan Laptev
Cordelia Schmid
Josef Sivic
190
199
0
29 Jul 2017
Image Pivoting for Learning Multilingual Multimodal Representations
Image Pivoting for Learning Multilingual Multimodal Representations
Spandana Gella
Rico Sennrich
Frank Keller
Mirella Lapata
SSL
154
79
0
24 Jul 2017
OBJ2TEXT: Generating Visually Descriptive Language from Object Layouts
OBJ2TEXT: Generating Visually Descriptive Language from Object Layouts
Xuwang Yin
Vicente Ordonez
VLM
184
55
0
22 Jul 2017
CUNI System for the WMT17 Multimodal Translation Task
CUNI System for the WMT17 Multimodal Translation Task
Jindřich Helcl
Jindrich Libovický
133
11
0
14 Jul 2017
Identifying Spatial Relations in Images using Convolutional Neural
  Networks
Identifying Spatial Relations in Images using Convolutional Neural NetworksIEEE International Joint Conference on Neural Network (IJCNN), 2017
Mandar Haldekar
Ashwinkumar Ganesan
Tim Oates
104
42
0
13 Jun 2017
Multimodal Machine Learning: A Survey and Taxonomy
Multimodal Machine Learning: A Survey and Taxonomy
T. Baltrušaitis
Chaitanya Ahuja
Louis-Philippe Morency
584
3,662
0
26 May 2017
ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on
  Weakly-Supervised Classification and Localization of Common Thorax Diseases
ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases
Xiaosong Wang
Yifan Peng
Le Lu
Zhiyong Lu
M. Bagheri
Ronald M. Summers
LM&MA
790
3,073
0
05 May 2017
TALL: Temporal Activity Localization via Language Query
TALL: Temporal Activity Localization via Language Query
J. Gao
Chen Sun
Zhenheng Yang
Ram Nevatia
496
1,004
0
05 May 2017
Weakly-supervised Visual Grounding of Phrases with Linguistic Structures
Weakly-supervised Visual Grounding of Phrases with Linguistic Structures
Fanyi Xiao
Leonid Sigal
Yong Jae Lee
169
143
0
03 May 2017
Spatio-temporal Person Retrieval via Natural Language Queries
Spatio-temporal Person Retrieval via Natural Language Queries
Masataka Yamaguchi
Kuniaki Saito
Yoshitaka Ushiku
Tatsuya Harada
227
63
0
26 Apr 2017
Learning Two-Branch Neural Networks for Image-Text Matching Tasks
Learning Two-Branch Neural Networks for Image-Text Matching Tasks
Liwei Wang
Yin Li
Jing-ling Huang
Svetlana Lazebnik
VLM
283
531
0
11 Apr 2017
Detecting Visual Relationships with Deep Relational Networks
Detecting Visual Relationships with Deep Relational Networks
Bo Dai
Yuqi Zhang
Dahua Lin
GNN
242
517
0
11 Apr 2017
Generating Descriptions with Grounded and Co-Referenced People
Generating Descriptions with Grounded and Co-Referenced People
Anna Rohrbach
Marcus Rohrbach
Siyu Tang
Seong Joon Oh
Bernt Schiele
588
72
0
05 Apr 2017
Weakly Supervised Dense Video Captioning
Weakly Supervised Dense Video Captioning
Zhiqiang Shen
Jianguo Li
Zhou Su
Minjun Li
Yurong Chen
Yu-Gang Jiang
Xiangyang Xue
188
140
0
05 Apr 2017
Aligned Image-Word Representations Improve Inductive Transfer Across
  Vision-Language Tasks
Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks
Tanmay Gupta
Kevin J. Shih
Saurabh Singh
Derek Hoiem
287
26
0
02 Apr 2017
Unsupervised Visual-Linguistic Reference Resolution in Instructional
  Videos
Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos
De-An Huang
Joseph J. Lim
Li Fei-Fei
Juan Carlos Niebles
188
55
0
07 Mar 2017
Visual Translation Embedding Network for Visual Relation Detection
Visual Translation Embedding Network for Visual Relation DetectionComputer Vision and Pattern Recognition (CVPR), 2017
Hanwang Zhang
Zawlin Kyaw
Shih-Fu Chang
Tat-Seng Chua
ViT
401
584
0
27 Feb 2017
On the Origin of Deep Learning
On the Origin of Deep Learning
Haohan Wang
Bhiksha Raj
MedIm3DVVLM
354
235
0
24 Feb 2017
Learning to Detect Human-Object Interactions
Learning to Detect Human-Object InteractionsIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2017
Yu-Wei Chao
Yunfan Liu
Michael Xieyang Liu
Huayi Zeng
Gaowen Liu
246
585
0
17 Feb 2017
A Joint Speaker-Listener-Reinforcer Model for Referring Expressions
A Joint Speaker-Listener-Reinforcer Model for Referring ExpressionsComputer Vision and Pattern Recognition (CVPR), 2016
Licheng Yu
Hao Tan
Joey Tianyi Zhou
Tamara L. Berg
ObjD
209
289
0
30 Dec 2016
Top-down Visual Saliency Guided by Captions
Top-down Visual Saliency Guided by CaptionsComputer Vision and Pattern Recognition (CVPR), 2016
Vasili Ramanishka
Abir Das
Jianming Zhang
Kate Saenko
173
148
0
21 Dec 2016
An Empirical Study of Language CNN for Image Captioning
An Empirical Study of Language CNN for Image CaptioningIEEE International Conference on Computer Vision (ICCV), 2016
Jiuxiang Gu
G. Wang
Jianfei Cai
Tsuhan Chen
276
149
0
21 Dec 2016
ImageNet pre-trained models with batch normalization
ImageNet pre-trained models with batch normalization
Marcel Simon
E. Rodner
Joachim Denzler
VLMSSeg
198
169
0
05 Dec 2016
Areas of Attention for Image Captioning
Areas of Attention for Image Captioning
M. Pedersoli
Thomas Lucas
Cordelia Schmid
Jakob Verbeek
298
217
0
03 Dec 2016
Visual Dialog
Visual Dialog
Abhishek Das
Satwik Kottur
Khushi Gupta
Avi Singh
Deshraj Yadav
José M. F. Moura
Devi Parikh
Dhruv Batra
399
1,063
0
26 Nov 2016
Phrase Localization and Visual Relationship Detection with Comprehensive
  Image-Language Cues
Phrase Localization and Visual Relationship Detection with Comprehensive Image-Language Cues
Bryan A. Plummer
Arun Mallya
Christopher M. Cervantes
Anjali Narayan-Chen
Svetlana Lazebnik
368
191
0
21 Nov 2016
Instance-aware Image and Sentence Matching with Selective Multimodal
  LSTM
Instance-aware Image and Sentence Matching with Selective Multimodal LSTM
Yan Huang
Wei Wang
Liang Wang
216
229
0
17 Nov 2016
A Semi-supervised Framework for Image Captioning
A Semi-supervised Framework for Image Captioning
Wenhu Chen
Aurelien Lucchi
Thomas Hofmann
215
9
0
16 Nov 2016
Dual Attention Networks for Multimodal Reasoning and Matching
Dual Attention Networks for Multimodal Reasoning and Matching
Hyeonseob Nam
Jung-Woo Ha
Jeonghee Kim
232
703
0
02 Nov 2016
Optimizing Open-Ended Crowdsourcing: The Next Frontier in Crowdsourced
  Data Management
Optimizing Open-Ended Crowdsourcing: The Next Frontier in Crowdsourced Data Management
Aditya G. Parameswaran
Akash Das Sarma
Vipul Venkataraman
91
11
0
17 Oct 2016
Previous
123...252627
Next
Page 26 of 27
Pageof 27