Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1608.03410
Cited By
Solving Visual Madlibs with Multiple Cues
11 August 2016
Tatiana Tommasi
Arun Mallya
Bryan A. Plummer
Svetlana Lazebnik
Alexander C. Berg
Tamara L. Berg
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Solving Visual Madlibs with Multiple Cues"
12 / 12 papers shown
Title
Multimodal Research in Vision and Language: A Review of Current and Emerging Trends
Shagun Uppal
Sarthak Bhagat
Devamanyu Hazarika
Navonil Majumdar
Soujanya Poria
Roger Zimmermann
Amir Zadeh
18
6
0
19 Oct 2020
Language Features Matter: Effective Language Representations for Vision-Language Tasks
Andrea Burns
Reuben Tan
Kate Saenko
Stan Sclaroff
Bryan A. Plummer
VLM
19
27
0
17 Aug 2019
AttentionRNN: A Structured Spatial Attention Mechanism
Siddhesh Khandelwal
Leonid Sigal
8
3
0
22 May 2019
Neural Sequential Phrase Grounding (SeqGROUND)
Pelin Dogan
Leonid Sigal
Markus Gross
ObjD
16
51
0
18 Mar 2019
Learning Visual Question Answering by Bootstrapping Hard Attention
Mateusz Malinowski
Carl Doersch
Adam Santoro
Peter W. Battaglia
OOD
19
96
0
01 Aug 2018
Conditional Image-Text Embedding Networks
Bryan A. Plummer
Paige Kordas
M. Kiapour
Shuai Zheng
Robinson Piramuthu
Svetlana Lazebnik
13
118
0
22 Nov 2017
cvpaper.challenge in 2016: Futuristic Computer Vision through 1,600 Papers Survey
Hirokatsu Kataoka
Soma Shirakabe
Yun He
S. Ueta
Teppei Suzuki
...
Ryousuke Takasawa
Masataka Fuchida
Yudai Miyashita
Kazushige Okayasu
Yuta Matsuzaki
22
1
0
20 Jul 2017
Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks
Tanmay Gupta
Kevin J. Shih
Saurabh Singh
Derek Hoiem
29
26
0
02 Apr 2017
The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions
Peng Wang
Qi Wu
Chunhua Shen
A. Hengel
OOD
18
86
0
16 Dec 2016
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
149
1,465
0
06 Jun 2016
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
Bryan A. Plummer
Liwei Wang
Christopher M. Cervantes
Juan C. Caicedo
J. Hockenmaier
Svetlana Lazebnik
48
1,998
0
19 May 2015
A Multi-View Embedding Space for Modeling Internet Images, Tags, and their Semantics
Yunchao Gong
Qifa Ke
Michael Isard
Svetlana Lazebnik
3DV
60
584
0
18 Dec 2012
1