ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1812.03299
  4. Cited By
Learning to Assemble Neural Module Tree Networks for Visual Grounding

Learning to Assemble Neural Module Tree Networks for Visual Grounding

8 December 2018
Daqing Liu
Hanwang Zhang
Feng Wu
Zhengjun Zha
ArXivPDFHTML

Papers citing "Learning to Assemble Neural Module Tree Networks for Visual Grounding"

50 / 58 papers shown
Title
Fine-Grained Open-Vocabulary Object Detection with Fined-Grained Prompts: Task, Dataset and Benchmark
Fine-Grained Open-Vocabulary Object Detection with Fined-Grained Prompts: Task, Dataset and Benchmark
Ying Liu
Yijing Hua
Haojiang Chai
Yanbo Wang
TengQi Ye
ObjD
64
0
0
19 Mar 2025
SwimVG: Step-wise Multimodal Fusion and Adaption for Visual Grounding
SwimVG: Step-wise Multimodal Fusion and Adaption for Visual Grounding
Liangtao Shi
Ting Liu
Xiantao Hu
Yue Hu
Quanjun Yin
Richang Hong
ObjD
54
0
0
24 Feb 2025
Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints
Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints
Ming Dai
Jian Li
Jiedong Zhuang
Xian Zhang
Wankou Yang
ObjD
44
1
0
12 Jan 2025
MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension
MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension
Ting Liu
Zunnan Xu
Yue Hu
Liangtao Shi
Zhiqiang Wang
Quanjun Yin
70
2
0
03 Jan 2025
Paint Outside the Box: Synthesizing and Selecting Training Data for Visual Grounding
Paint Outside the Box: Synthesizing and Selecting Training Data for Visual Grounding
Zilin Du
Haoxin Li
Jianfei Yu
Boyang Li
236
0
0
01 Dec 2024
Visual Grounding with Attention-Driven Constraint Balancing
Visual Grounding with Attention-Driven Constraint Balancing
Weitai Kang
Luowei Zhou
Junyi Wu
Changchang Sun
Yan Yan
45
4
0
03 Jul 2024
HARIS: Human-Like Attention for Reference Image Segmentation
HARIS: Human-Like Attention for Reference Image Segmentation
Mengxi Zhang
Heqing Lian
Yiming Liu
Jie Chen
VLM
26
0
0
17 May 2024
Deep Instruction Tuning for Segment Anything Model
Deep Instruction Tuning for Segment Anything Model
Xiaorui Huang
Gen Luo
Chaoyang Zhu
Bo Tong
Yiyi Zhou
Xiaoshuai Sun
Rongrong Ji
VLM
57
1
0
31 Mar 2024
Semi-Supervised Image Captioning Considering Wasserstein Graph Matching
Semi-Supervised Image Captioning Considering Wasserstein Graph Matching
Yang Yang
41
0
0
26 Mar 2024
3VL: Using Trees to Improve Vision-Language Models' Interpretability
3VL: Using Trees to Improve Vision-Language Models' Interpretability
Nir Yellinek
Leonid Karlinsky
Raja Giryes
CoGe
VLM
54
4
0
28 Dec 2023
A Joint Study of Phrase Grounding and Task Performance in Vision and
  Language Models
A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models
Noriyuki Kojima
Hadar Averbuch-Elor
Yoav Artzi
34
2
0
06 Sep 2023
PM-DETR: Domain Adaptive Prompt Memory for Object Detection with
  Transformers
PM-DETR: Domain Adaptive Prompt Memory for Object Detection with Transformers
Peidong Jia
Jiaming Liu
Senqiao Yang
Jiarui Wu
Xiaodong Xie
Shanghang Zhang
VLM
52
2
0
01 Jul 2023
Multi-Modal Mutual Attention and Iterative Interaction for Referring
  Image Segmentation
Multi-Modal Mutual Attention and Iterative Interaction for Referring Image Segmentation
Chang Liu
Henghui Ding
Yulun Zhang
Xudong Jiang
34
47
0
24 May 2023
TreePrompt: Learning to Compose Tree Prompts for Explainable Visual
  Grounding
TreePrompt: Learning to Compose Tree Prompts for Explainable Visual Grounding
Chenchi Zhang
Jun Xiao
Lei Chen
Jian Shao
Long Chen
VLM
LRM
34
2
0
19 May 2023
Referring Multi-Object Tracking
Referring Multi-Object Tracking
Dongming Wu
Wencheng Han
Tiancai Wang
Xingping Dong
Xiangyu Zhang
Jianbing Shen
40
71
0
06 Mar 2023
Towards Real-Time Panoptic Narrative Grounding by an End-to-End
  Grounding Network
Towards Real-Time Panoptic Narrative Grounding by an End-to-End Grounding Network
Haowei Wang
Jiayi Ji
Yiyi Zhou
Yongjian Wu
Xiaoshuai Sun
38
15
0
09 Jan 2023
Betrayed by Captions: Joint Caption Grounding and Generation for Open
  Vocabulary Instance Segmentation
Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation
Jianzong Wu
Xiangtai Li
Henghui Ding
Xia Li
Guangliang Cheng
Yu Tong
Chen Change Loy
VLM
97
31
0
02 Jan 2023
Learning Label Modular Prompts for Text Classification in the Wild
Learning Label Modular Prompts for Text Classification in the Wild
Hailin Chen
Amrita Saha
Chenyu You
Steven C. H. Hoi
OOD
VLM
26
5
0
30 Nov 2022
DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and
  Grounding
DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding
Siyi Liu
Yaoyuan Liang
Feng Li
Shijia Huang
Hao Zhang
Hang Su
Jun Zhu
Lei Zhang
ObjD
50
26
0
28 Nov 2022
Who are you referring to? Coreference resolution in image narrations
Who are you referring to? Coreference resolution in image narrations
A. Goel
Basura Fernando
Frank Keller
Hakan Bilen
27
3
0
26 Nov 2022
A Unified Mutual Supervision Framework for Referring Expression
  Segmentation and Generation
A Unified Mutual Supervision Framework for Referring Expression Segmentation and Generation
Shijia Huang
Feng Li
Hao Zhang
Siyi Liu
Lei Zhang
Liwei Wang
30
5
0
15 Nov 2022
YORO -- Lightweight End to End Visual Grounding
YORO -- Lightweight End to End Visual Grounding
Chih-Hui Ho
Srikar Appalaraju
Bhavan A. Jasani
R. Manmatha
Nuno Vasconcelos
ObjD
21
21
0
15 Nov 2022
RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing
  Data
RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing Data
Yangfan Zhan
Zhitong Xiong
Yuan. Yuan
78
108
0
23 Oct 2022
Learning to Collocate Visual-Linguistic Neural Modules for Image
  Captioning
Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning
Xu Yang
Hanwang Zhang
Chongyang Gao
Jianfei Cai
MLLM
45
10
0
04 Oct 2022
Dynamic MDETR: A Dynamic Multimodal Transformer Decoder for Visual
  Grounding
Dynamic MDETR: A Dynamic Multimodal Transformer Decoder for Visual Grounding
Fengyuan Shi
Ruopeng Gao
Weilin Huang
Limin Wang
30
23
0
28 Sep 2022
RefCrowd: Grounding the Target in Crowd with Referring Expressions
RefCrowd: Grounding the Target in Crowd with Referring Expressions
Heqian Qiu
Hongliang Li
Taijin Zhao
Lanxiao Wang
Qingbo Wu
Fanman Meng
ObjD
32
6
0
16 Jun 2022
Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering
Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering
A. Piergiovanni
Wei Li
Weicheng Kuo
M. Saffar
Fred Bertsch
A. Angelova
17
16
0
02 May 2022
Improving Visual Grounding with Visual-Linguistic Verification and
  Iterative Reasoning
Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning
Li Yang
Yan Xu
Chunfen Yuan
Wei Liu
Bing Li
Weiming Hu
ObjD
52
113
0
30 Apr 2022
3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive
  Selection
3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection
Jun-Bin Luo
Jiahui Fu
Xianghao Kong
Chen Gao
Haibing Ren
Hao Shen
Huaxia Xia
Si Liu
37
89
0
13 Apr 2022
Multi-View Transformer for 3D Visual Grounding
Multi-View Transformer for 3D Visual Grounding
Shijia Huang
Yilun Chen
Jiaya Jia
Liwei Wang
31
114
0
05 Apr 2022
Shifting More Attention to Visual Backbone: Query-modulated Refinement
  Networks for End-to-End Visual Grounding
Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual Grounding
Jiabo Ye
Junfeng Tian
Ming Yan
Xiaoshan Yang
Xuwu Wang
Ji Zhang
Liang He
Xin Lin
ObjD
19
61
0
29 Mar 2022
Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
Haojun Jiang
Yuanze Lin
Dongchen Han
Shiji Song
Gao Huang
ObjD
48
51
0
16 Mar 2022
CRIS: CLIP-Driven Referring Image Segmentation
CRIS: CLIP-Driven Referring Image Segmentation
Zhaoqing Wang
Yu Lu
Qiang Li
Xunqiang Tao
Yan Guo
Ming Gong
Tongliang Liu
VLM
63
361
0
30 Nov 2021
EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained
  Embedding Matching
EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching
Yaya Shi
Xu Yang
Haiyang Xu
Chunfen Yuan
Bing Li
Weiming Hu
Zhengjun Zha
39
33
0
17 Nov 2021
A Review of the Gumbel-max Trick and its Extensions for Discrete
  Stochasticity in Machine Learning
A Review of the Gumbel-max Trick and its Extensions for Discrete Stochasticity in Machine Learning
Iris A. M. Huijben
W. Kool
Max B. Paulus
Ruud J. G. van Sloun
28
94
0
04 Oct 2021
Panoptic Narrative Grounding
Panoptic Narrative Grounding
Cristina González
Nicolás Ayobi
Isabela Hernández
José Hernández
Jordi Pont-Tuset
Pablo Arbeláez
90
22
0
10 Sep 2021
Auto-Parsing Network for Image Captioning and Visual Question Answering
Auto-Parsing Network for Image Captioning and Visual Question Answering
Xu Yang
Chongyang Gao
Hanwang Zhang
Jianfei Cai
24
35
0
24 Aug 2021
Exploring Sequence Feature Alignment for Domain Adaptive Detection
  Transformers
Exploring Sequence Feature Alignment for Domain Adaptive Detection Transformers
Wen Wang
Yang Cao
Jing Zhang
Fengxiang He
Zhengjun Zha
Yonggang Wen
Dacheng Tao
ViT
29
94
0
27 Jul 2021
Disentangle Your Dense Object Detector
Disentangle Your Dense Object Detector
Zehui Chen
Chenhongyi Yang
Qiaofei Li
Feng Zhao
Zhengjun Zha
Feng Wu
3DV
27
147
0
07 Jul 2021
Referring Transformer: A One-step Approach to Multi-task Visual
  Grounding
Referring Transformer: A One-step Approach to Multi-task Visual Grounding
Muchen Li
Leonid Sigal
ObjD
13
189
0
06 Jun 2021
TransVG: End-to-End Visual Grounding with Transformers
TransVG: End-to-End Visual Grounding with Transformers
Jiajun Deng
Zhengyuan Yang
Tianlang Chen
Wen-gang Zhou
Houqiang Li
ViT
28
332
0
17 Apr 2021
Locate then Segment: A Strong Pipeline for Referring Image Segmentation
Locate then Segment: A Strong Pipeline for Referring Image Segmentation
Ya Jing
Tao Kong
Wei Wang
Liang Wang
Lei Li
Tieniu Tan
15
132
0
30 Mar 2021
Decoupled Spatial Temporal Graphs for Generic Visual Grounding
Decoupled Spatial Temporal Graphs for Generic Visual Grounding
Qi Feng
Yunchao Wei
Mingming Cheng
Yi Yang
27
5
0
18 Mar 2021
Linguistic Structure Guided Context Modeling for Referring Image
  Segmentation
Linguistic Structure Guided Context Modeling for Referring Image Segmentation
Tianrui Hui
Si Liu
Shaofei Huang
Guanbin Li
Sansi Yu
Faxi Zhang
Jizhong Han
21
148
0
01 Oct 2020
RefVOS: A Closer Look at Referring Expressions for Video Object
  Segmentation
RefVOS: A Closer Look at Referring Expressions for Video Object Segmentation
Míriam Bellver
Carles Ventura
Carina Silberer
Ioannis V. Kazakos
Jordi Torres
Xavier Giró-i-Nieto
VOS
29
32
0
01 Oct 2020
AttnGrounder: Talking to Cars with Attention
AttnGrounder: Talking to Cars with Attention
Vivek Mittal
ViT
32
11
0
11 Sep 2020
Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression
  Grounding
Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding
Long Chen
Wenbo Ma
Jun Xiao
Hanwang Zhang
Shih-Fu Chang
ObjD
17
89
0
03 Sep 2020
PhraseCut: Language-based Image Segmentation in the Wild
PhraseCut: Language-based Image Segmentation in the Wild
Chenyun Wu
Zhe-nan Lin
Scott D. Cohen
Trung Bui
Subhransu Maji
VLM
13
111
0
03 Aug 2020
Referring Expression Comprehension: A Survey of Methods and Datasets
Referring Expression Comprehension: A Survey of Methods and Datasets
Yanyuan Qiao
Chaorui Deng
Qi Wu
ObjD
50
93
0
19 Jul 2020
Learning to Discretely Compose Reasoning Module Networks for Video
  Captioning
Learning to Discretely Compose Reasoning Module Networks for Video Captioning
Ganchao Tan
Daqing Liu
Meng Wang
Zhengjun Zha
LRM
25
73
0
17 Jul 2020
12
Next