v1v2v3v4 (latest)

Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models

19 May 2015

Bryan A. Plummer

Liwei Wang

Christopher M. Cervantes

Papers citing "Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models"

50 / 1,326 papers shown

Semi-supervised Visual Feature Integration for Pre-trained Language Models

246

01 Dec 2019

OptiBox: Breaking the Limits of Proposals for Visual Grounding

178

29 Nov 2019

Dividing and Conquering Cross-Modal Recipe Retrieval: from Nearest Neighbours Baselines to SoTA

250

28 Nov 2019

Learning Cross-modal Context Graph for Visual GroundingAAAI Conference on Artificial Intelligence (AAAI), 2019

281

20 Nov 2019

Ladder Loss for Coherent Visual-Semantic EmbeddingAAAI Conference on Artificial Intelligence (AAAI), 2019

301

18 Nov 2019

Multimodal Intelligence: Representation Learning, Information Fusion, and ApplicationsIEEE Journal on Selected Topics in Signal Processing (JSTSP), 2019

Chao Zhang

Zichao Yang

Xiaodong He

Li Deng

HAI AI4TS

325

416

10 Nov 2019

Drill-down: Interactive Retrieval of Complex Scenes using Natural Language QueriesNeural Information Processing Systems (NeurIPS), 2019

Fuwen Tan

Paola Cascante-Bonilla

202

10 Nov 2019

Contextual Grounding of Natural Language Entities in Images

147

05 Nov 2019

Leveraging Auxiliary Text for Deep Recognition of Unseen Visual RelationshipsInternational Conference on Learning Representations (ICLR), 2019

G. S. Kenigsfield

Ran El-Yaniv

125

27 Oct 2019

REMIND Your Neural Network to Prevent Catastrophic ForgettingEuropean Conference on Computer Vision (ECCV), 2019

441

333

06 Oct 2019

UNITER: UNiversal Image-TExt Representation LearningEuropean Conference on Computer Vision (ECCV), 2019

399

469

25 Sep 2019

Visuallly Grounded Generation of Entailments from PremisesInternational Conference on Natural Language Generation (INLG), 2019

Somayeh Jafaritazehjani

Albert Gatt

Marc Tanti

LRM

124

21 Sep 2019

ContCap: A scalable framework for continual image captioning

118

19 Sep 2019

Joint Wasserstein Autoencoders for Aligning Multimodal Embeddings

112

14 Sep 2019

MULE: Multimodal Universal Language EmbeddingAAAI Conference on Artificial Intelligence (AAAI), 2019

210

08 Sep 2019

Do Cross Modal Systems Leverage Semantic Relationships?

Shah Nawaz

Muhammad Kamran Janjua

136

03 Sep 2019

Phrase Grounding by Soft-Label Chain Conditional Random FieldConference on Empirical Methods in Natural Language Processing (EMNLP), 2019

Hamish Ivison

Anjali Narayan-Chen

128

01 Sep 2019

Aesthetic Image Captioning From Weakly-Labelled Photographs

Koustav Ghosal

A. Rana

A. Smolic

274

29 Aug 2019

Probing Representations Learned by Multimodal Recurrent and Transformer Models

Jindrich Libovický

Pranava Madhyastha

146

29 Aug 2019

Adversarial Representation Learning for Text-to-Image MatchingIEEE International Conference on Computer Vision (ICCV), 2019

278

221

28 Aug 2019

Towards Unsupervised Image Captioning with Shared Multimodal EmbeddingsIEEE International Conference on Computer Vision (ICCV), 2019

Iro Laina

Christian Rupprecht

Nassir Navab

SSL

221

112

25 Aug 2019

Phrase Localization Without Paired Training ExamplesIEEE International Conference on Computer Vision (ICCV), 2019

Josiah Wang

Lucia Specia

138

20 Aug 2019

Zero-Shot Grounding of Objects from Natural Language QueriesIEEE International Conference on Computer Vision (ICCV), 2019

261

173

20 Aug 2019

A Fast and Accurate One-Stage Approach to Visual GroundingIEEE International Conference on Computer Vision (ICCV), 2019

Dong Yu

305

432

18 Aug 2019

Language Features Matter: Effective Language Representations for Vision-Language TasksIEEE International Conference on Computer Vision (ICCV), 2019

192

17 Aug 2019

VisualBERT: A Simple and Performant Baseline for Vision and Language

745

2,227

09 Aug 2019

Semi Supervised Phrase Localization in a Bidirectional Caption-Image Retrieval Framework

Deepan Das

Noor Mohammed Ghouse

Shashank Verma

Yin Li

123

08 Aug 2019

Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and MethodsJournal of Artificial Intelligence Research (JAIR), 2019

419

143

22 Jul 2019

Hindi Visual Genome: A Dataset for Multimodal English-to-Hindi Machine TranslationJournal of Computacion y Sistemas (JCYS), 2019

Shantipriya Parida

Ondrej Bojar

S. Dash

192

21 Jul 2019

Variational Context: Exploiting Visual and Textual Context for Grounding Referring ExpressionsIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2019

171

08 Jul 2019

Distilling Translations with Visual AwarenessAnnual Meeting of the Association for Computational Linguistics (ACL), 2019

Julia Ive

Pranava Madhyastha

Lucia Specia

VLM

270

18 Jun 2019

Expressing Visual Relationships via LanguageAnnual Meeting of the Association for Computational Linguistics (ACL), 2019

246

18 Jun 2019

Weakly-Supervised Spatio-Temporally Grounding Natural Sentence in VideoAnnual Meeting of the Association for Computational Linguistics (ACL), 2019

306

111

06 Jun 2019

The PhotoBook Dataset: Building Common Ground through Visually-Grounded DialogueAnnual Meeting of the Association for Computational Linguistics (ACL), 2019

186

04 Jun 2019

Listening while Speaking and Visualizing: Improving ASR through Multimodal ChainAutomatic Speech Recognition & Understanding (ASRU), 2019

162

03 Jun 2019

Stochastic Generalized Adversarial Label Learning

Chidubem Arachie

Bert Huang

NoLa

109

03 Jun 2019

Learning to Generate Grounded Visual Captions without Localization Supervision

474

01 Jun 2019

Interactive-predictive neural multimodal systemsIberian Conference on Pattern Recognition and Image Analysis (IbPRIA), 2019

Álvaro Peris

F. Casacuberta

KELM HAI

201

30 May 2019

Contextual Translation Embedding for Visual Relationship Detection and Scene Graph Generation

233

28 May 2019

Don't Blame Distributional Semantics if it can't do EntailmentInternational Conference on Computational Semantics (IWCS), 2019

M. Westera

Gemma Boleda

CoGe

166

17 May 2019

Deep Metric Learning Beyond Binary Supervision

154

103

21 Apr 2019

Saliency-Guided Attention Network for Image-Sentence Matching

250

20 Apr 2019

Unsupervised Discovery of Multimodal Links in Multi-image, Multi-sentence Documents

Jack Hessel

Lillian Lee

David M. Mimno

171

16 Apr 2019

Natural Language Semantics With Pictures: Some Language & Vision Datasets and Potential Uses for Computational Semantics

David Schlangen

155

15 Apr 2019

Referring to Objects in Videos using Spatio-Temporal Identifying Descriptions

Peratham Wiriyathammabhum

Abhinav Shrivastava

Vlad I. Morariu

L. Davis

155

08 Apr 2019

Modularized Textual Grounding for Counterfactual Resilience

212

07 Apr 2019

VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research

Lei Li

536

654

06 Apr 2019

Good News, Everyone! Context driven entity-aware captioning for news images

197

156

02 Apr 2019

Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment

Devi Parikh

211

112

27 Mar 2019

Probing the Need for Visual Context in Multimodal Machine TranslationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2019

Ozan Caglayan

Pranava Madhyastha

Lucia Specia

Loïc Barrault

217

154

20 Mar 2019