ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1611.06641
  4. Cited By
Phrase Localization and Visual Relationship Detection with Comprehensive
  Image-Language Cues

Phrase Localization and Visual Relationship Detection with Comprehensive Image-Language Cues

21 November 2016
Bryan A. Plummer
Arun Mallya
Christopher M. Cervantes
J. Hockenmaier
Svetlana Lazebnik
ArXivPDFHTML

Papers citing "Phrase Localization and Visual Relationship Detection with Comprehensive Image-Language Cues"

21 / 21 papers shown
Title
Fine-Grained Open-Vocabulary Object Detection with Fined-Grained Prompts: Task, Dataset and Benchmark
Fine-Grained Open-Vocabulary Object Detection with Fined-Grained Prompts: Task, Dataset and Benchmark
Ying Liu
Yijing Hua
Haojiang Chai
Yanbo Wang
TengQi Ye
ObjD
54
0
0
19 Mar 2025
Language-Guided Diffusion Model for Visual Grounding
Language-Guided Diffusion Model for Visual Grounding
Sijia Chen
Baochun Li
27
5
0
18 Aug 2023
Adapting CLIP For Phrase Localization Without Further Training
Adapting CLIP For Phrase Localization Without Further Training
Jiahao Li
G. Shakhnarovich
Raymond A. Yeh
VLM
CLIP
28
25
0
07 Apr 2022
Unpaired Referring Expression Grounding via Bidirectional Cross-Modal
  Matching
Unpaired Referring Expression Grounding via Bidirectional Cross-Modal Matching
Hengcan Shi
Munawar Hayat
Jianfei Cai
ObjD
18
10
0
18 Jan 2022
Weakly-Supervised Video Object Grounding via Causal Intervention
Weakly-Supervised Video Object Grounding via Causal Intervention
Wei Wang
Junyu Gao
Changsheng Xu
CML
28
20
0
01 Dec 2021
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
ObjD
VLM
55
858
0
26 Apr 2021
COOT: Cooperative Hierarchical Transformer for Video-Text Representation
  Learning
COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning
Simon Ging
Mohammadreza Zolfaghari
Hamed Pirsiavash
Thomas Brox
ViT
CLIP
13
168
0
01 Nov 2020
Enriching Video Captions With Contextual Text
Enriching Video Captions With Contextual Text
Philipp Rimle
Pelin Dogan
Markus Gross
22
3
0
29 Jul 2020
Detecting Human-Object Interactions with Action Co-occurrence Priors
Detecting Human-Object Interactions with Action Co-occurrence Priors
Dong-Jin Kim
Xiao Sun
Jinsoo Choi
Stephen Lin
In So Kweon
8
124
0
17 Jul 2020
Explanation-based Weakly-supervised Learning of Visual Relations with
  Graph Networks
Explanation-based Weakly-supervised Learning of Visual Relations with Graph Networks
Federico Baldassarre
Kevin Smith
Josephine Sullivan
Hossein Azizpour
19
24
0
16 Jun 2020
A Real-time Global Inference Network for One-stage Referring Expression
  Comprehension
A Real-time Global Inference Network for One-stage Referring Expression Comprehension
Yiyi Zhou
Rongrong Ji
Gen Luo
Xiaoshuai Sun
Jinsong Su
Xinghao Ding
Chia-Wen Lin
Q. Tian
ObjD
24
60
0
07 Dec 2019
Learning Visual Relation Priors for Image-Text Matching and Image
  Captioning with Neural Scene Graph Generators
Learning Visual Relation Priors for Image-Text Matching and Image Captioning with Neural Scene Graph Generators
Kuang-Huei Lee
Hamid Palangi
Xi Chen
Houdong Hu
Jianfeng Gao
VLM
19
37
0
22 Sep 2019
Phrase Localization Without Paired Training Examples
Phrase Localization Without Paired Training Examples
Josiah Wang
Lucia Specia
21
41
0
20 Aug 2019
Contextual Translation Embedding for Visual Relationship Detection and
  Scene Graph Generation
Contextual Translation Embedding for Visual Relationship Detection and Scene Graph Generation
Zih-Siou Hung
Arun Mallya
Svetlana Lazebnik
ViT
12
14
0
28 May 2019
Context-Dependent Diffusion Network for Visual Relationship Detection
Context-Dependent Diffusion Network for Visual Relationship Detection
Zhen Cui
Chunyan Xu
Wenming Zheng
Jian Yang
GNN
12
50
0
11 Sep 2018
Mapping Images to Scene Graphs with Permutation-Invariant Structured
  Prediction
Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction
Roei Herzig
Moshiko Raboh
Gal Chechik
Jonathan Berant
Amir Globerson
GNN
OCL
24
133
0
15 Feb 2018
Scene Graph Generation from Objects, Phrases and Region Captions
Scene Graph Generation from Objects, Phrases and Region Captions
Yikang Li
Wanli Ouyang
Bolei Zhou
Kun Wang
Xiaogang Wang
21
499
0
31 Jul 2017
Pixels to Graphs by Associative Embedding
Pixels to Graphs by Associative Embedding
Alejandro Newell
Jia Deng
GNN
VOS
22
232
0
22 Jun 2017
Visual Translation Embedding Network for Visual Relation Detection
Visual Translation Embedding Network for Visual Relation Detection
Hanwang Zhang
Zawlin Kyaw
Shih-Fu Chang
Tat-Seng Chua
ViT
140
560
0
27 Feb 2017
Multimodal Compact Bilinear Pooling for Visual Question Answering and
  Visual Grounding
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
149
1,465
0
06 Jun 2016
A Multi-View Embedding Space for Modeling Internet Images, Tags, and
  their Semantics
A Multi-View Embedding Space for Modeling Internet Images, Tags, and their Semantics
Yunchao Gong
Qifa Ke
Michael Isard
Svetlana Lazebnik
3DV
60
584
0
18 Dec 2012
1