Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1611.06641
Cited By
Phrase Localization and Visual Relationship Detection with Comprehensive Image-Language Cues
21 November 2016
Bryan A. Plummer
Arun Mallya
Christopher M. Cervantes
J. Hockenmaier
Svetlana Lazebnik
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Phrase Localization and Visual Relationship Detection with Comprehensive Image-Language Cues"
21 / 21 papers shown
Title
Fine-Grained Open-Vocabulary Object Detection with Fined-Grained Prompts: Task, Dataset and Benchmark
Ying Liu
Yijing Hua
Haojiang Chai
Yanbo Wang
TengQi Ye
ObjD
54
0
0
19 Mar 2025
Language-Guided Diffusion Model for Visual Grounding
Sijia Chen
Baochun Li
27
5
0
18 Aug 2023
Adapting CLIP For Phrase Localization Without Further Training
Jiahao Li
G. Shakhnarovich
Raymond A. Yeh
VLM
CLIP
28
25
0
07 Apr 2022
Unpaired Referring Expression Grounding via Bidirectional Cross-Modal Matching
Hengcan Shi
Munawar Hayat
Jianfei Cai
ObjD
16
10
0
18 Jan 2022
Weakly-Supervised Video Object Grounding via Causal Intervention
Wei Wang
Junyu Gao
Changsheng Xu
CML
28
20
0
01 Dec 2021
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
ObjD
VLM
55
858
0
26 Apr 2021
COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning
Simon Ging
Mohammadreza Zolfaghari
Hamed Pirsiavash
Thomas Brox
ViT
CLIP
13
168
0
01 Nov 2020
Enriching Video Captions With Contextual Text
Philipp Rimle
Pelin Dogan
Markus Gross
22
3
0
29 Jul 2020
Detecting Human-Object Interactions with Action Co-occurrence Priors
Dong-Jin Kim
Xiao Sun
Jinsoo Choi
Stephen Lin
In So Kweon
6
124
0
17 Jul 2020
Explanation-based Weakly-supervised Learning of Visual Relations with Graph Networks
Federico Baldassarre
Kevin Smith
Josephine Sullivan
Hossein Azizpour
19
24
0
16 Jun 2020
A Real-time Global Inference Network for One-stage Referring Expression Comprehension
Yiyi Zhou
Rongrong Ji
Gen Luo
Xiaoshuai Sun
Jinsong Su
Xinghao Ding
Chia-Wen Lin
Q. Tian
ObjD
24
60
0
07 Dec 2019
Learning Visual Relation Priors for Image-Text Matching and Image Captioning with Neural Scene Graph Generators
Kuang-Huei Lee
Hamid Palangi
Xi Chen
Houdong Hu
Jianfeng Gao
VLM
19
37
0
22 Sep 2019
Phrase Localization Without Paired Training Examples
Josiah Wang
Lucia Specia
21
41
0
20 Aug 2019
Contextual Translation Embedding for Visual Relationship Detection and Scene Graph Generation
Zih-Siou Hung
Arun Mallya
Svetlana Lazebnik
ViT
12
14
0
28 May 2019
Context-Dependent Diffusion Network for Visual Relationship Detection
Zhen Cui
Chunyan Xu
Wenming Zheng
Jian Yang
GNN
12
50
0
11 Sep 2018
Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction
Roei Herzig
Moshiko Raboh
Gal Chechik
Jonathan Berant
Amir Globerson
GNN
OCL
24
133
0
15 Feb 2018
Scene Graph Generation from Objects, Phrases and Region Captions
Yikang Li
Wanli Ouyang
Bolei Zhou
Kun Wang
Xiaogang Wang
21
499
0
31 Jul 2017
Pixels to Graphs by Associative Embedding
Alejandro Newell
Jia Deng
GNN
VOS
22
232
0
22 Jun 2017
Visual Translation Embedding Network for Visual Relation Detection
Hanwang Zhang
Zawlin Kyaw
Shih-Fu Chang
Tat-Seng Chua
ViT
140
560
0
27 Feb 2017
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
149
1,465
0
06 Jun 2016
A Multi-View Embedding Space for Modeling Internet Images, Tags, and their Semantics
Yunchao Gong
Qifa Ke
Michael Isard
Svetlana Lazebnik
3DV
60
584
0
18 Dec 2012
1