ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1801.08186
  4. Cited By
MAttNet: Modular Attention Network for Referring Expression
  Comprehension

MAttNet: Modular Attention Network for Referring Expression Comprehension

24 January 2018
Licheng Yu
Zhe-nan Lin
Xiaohui Shen
Jimei Yang
Xin Lu
Mohit Bansal
Tamara L. Berg
    ObjD
ArXivPDFHTML

Papers citing "MAttNet: Modular Attention Network for Referring Expression Comprehension"

50 / 161 papers shown
Title
Calibrating Uncertainty Quantification of Multi-Modal LLMs using Grounding
Calibrating Uncertainty Quantification of Multi-Modal LLMs using Grounding
Trilok Padhi
R. Kaur
Adam D. Cobb
Manoj Acharya
Anirban Roy
Colin Samplawski
Brian Matejek
Alexander M. Berenbeim
Nathaniel D. Bastian
Susmit Jha
28
0
0
30 Apr 2025
Efficient Adaptation For Remote Sensing Visual Grounding
Efficient Adaptation For Remote Sensing Visual Grounding
Hasan Moughnieh
Mohamad Chalhoub
Hasan Nasrallah
Cristiano Nattero
Paolo Campanella
Giovanni Nico
A. Ghandour
46
0
0
29 Mar 2025
Referring to Any Person
Referring to Any Person
Qing Jiang
Lin Wu
Zhaoyang Zeng
Tianhe Ren
Yuda Xiong
Yihao Chen
Qin Liu
Lei Zhang
148
0
0
11 Mar 2025
SwimVG: Step-wise Multimodal Fusion and Adaption for Visual Grounding
SwimVG: Step-wise Multimodal Fusion and Adaption for Visual Grounding
Liangtao Shi
Ting Liu
Xiantao Hu
Yue Hu
Quanjun Yin
Richang Hong
ObjD
48
0
0
24 Feb 2025
AeroReformer: Aerial Referring Transformer for UAV-based Referring Image Segmentation
AeroReformer: Aerial Referring Transformer for UAV-based Referring Image Segmentation
Rui Li
Xiaowei Zhao
56
0
0
23 Feb 2025
Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints
Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints
Ming Dai
Jian Li
Jiedong Zhuang
Xian Zhang
Wankou Yang
ObjD
42
1
0
12 Jan 2025
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Haobo Yuan
X. Li
Tao Zhang
Zilong Huang
Shilin Xu
S. Ji
Yunhai Tong
Lu Qi
Jiashi Feng
Ming Yang
VLM
94
11
0
07 Jan 2025
MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension
MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension
Ting Liu
Zunnan Xu
Yue Hu
Liangtao Shi
Zhiqiang Wang
Quanjun Yin
59
2
0
03 Jan 2025
Paint Outside the Box: Synthesizing and Selecting Training Data for Visual Grounding
Paint Outside the Box: Synthesizing and Selecting Training Data for Visual Grounding
Zilin Du
Haoxin Li
Jianfei Yu
Boyang Li
146
0
0
01 Dec 2024
A Parameter-Efficient Tuning Framework for Language-guided Object Grounding and Robot Grasping
A Parameter-Efficient Tuning Framework for Language-guided Object Grounding and Robot Grasping
Houjian Yu
Mingen Li
Alireza Rezazadeh
Yang Yang
Changhyun Choi
46
1
0
28 Sep 2024
HiFi-CS: Towards Open Vocabulary Visual Grounding For Robotic Grasping Using Vision-Language Models
HiFi-CS: Towards Open Vocabulary Visual Grounding For Robotic Grasping Using Vision-Language Models
V. Bhat
P. Krishnamurthy
Ramesh Karri
Farshad Khorrami
46
3
0
16 Sep 2024
SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring
  Expression Segmentation
SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation
Yi-Chia Chen
Wei-Hua Li
Cheng Sun
Yu-Chiang Frank Wang
Chu-Song Chen
VLM
39
11
0
01 Sep 2024
NanoMVG: USV-Centric Low-Power Multi-Task Visual Grounding based on Prompt-Guided Camera and 4D mmWave Radar
NanoMVG: USV-Centric Low-Power Multi-Task Visual Grounding based on Prompt-Guided Camera and 4D mmWave Radar
Runwei Guan
Jianan Liu
Liye Jia
Haocheng Zhao
Shanliang Yao
Xiaohui Zhu
Ka Lok Man
Eng Gee Lim
Jeremy S. Smith
Yutao Yue
51
5
0
30 Aug 2024
Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos
Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos
Dhruv Verma
Debaditya Roy
Basura Fernando
27
1
0
30 Jul 2024
Visual Grounding with Attention-Driven Constraint Balancing
Visual Grounding with Attention-Driven Constraint Balancing
Weitai Kang
Luowei Zhou
Junyi Wu
Changchang Sun
Yan Yan
35
4
0
03 Jul 2024
Object Segmentation from Open-Vocabulary Manipulation Instructions Based
  on Optimal Transport Polygon Matching with Multimodal Foundation Models
Object Segmentation from Open-Vocabulary Manipulation Instructions Based on Optimal Transport Polygon Matching with Multimodal Foundation Models
Takayuki Nishimura
Katsuyuki Kuyo
Motonari Kambara
Komei Sugiura
DiffM
30
0
0
01 Jul 2024
DARA: Domain- and Relation-aware Adapters Make Parameter-efficient
  Tuning for Visual Grounding
DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding
Ting Liu
Xuyang Liu
Siteng Huang
Honggang Chen
Quanjun Yin
Long Qin
Donglin Wang
Yue Hu
35
5
0
10 May 2024
Boosting Transferability in Vision-Language Attacks via Diversification
  along the Intersection Region of Adversarial Trajectory
Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory
Sensen Gao
Xiaojun Jia
Xuhong Ren
Ivor Tsang
Qing-Wu Guo
AAML
38
14
0
19 Mar 2024
SInViG: A Self-Evolving Interactive Visual Agent for Human-Robot
  Interaction
SInViG: A Self-Evolving Interactive Visual Agent for Human-Robot Interaction
Jie Xu
Hanbo Zhang
Xinghang Li
Huaping Liu
Xuguang Lan
Tao Kong
LM&Ro
35
3
0
19 Feb 2024
Collaborative Position Reasoning Network for Referring Image
  Segmentation
Collaborative Position Reasoning Network for Referring Image Segmentation
Jianjian Cao
Beiya Dai
Yulin Li
Xiameng Qin
Jingdong Wang
27
0
0
22 Jan 2024
Learning-To-Rank Approach for Identifying Everyday Objects Using a
  Physical-World Search Engine
Learning-To-Rank Approach for Identifying Everyday Objects Using a Physical-World Search Engine
Kanta Kaneda
Shunya Nagashima
Ryosuke Korekata
Motonari Kambara
Komei Sugiura
35
6
0
26 Dec 2023
Jack of All Tasks, Master of Many: Designing General-purpose
  Coarse-to-Fine Vision-Language Model
Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model
Shraman Pramanick
Guangxing Han
Rui Hou
Sayan Nag
Ser-Nam Lim
Nicolas Ballas
Qifan Wang
Rama Chellappa
Amjad Almahairi
VLM
MLLM
45
29
0
19 Dec 2023
Mask Grounding for Referring Image Segmentation
Mask Grounding for Referring Image Segmentation
Yong Xien Chng
Henry Zheng
Yizeng Han
Xuchong Qiu
Gao Huang
ISeg
ObjD
26
15
0
19 Dec 2023
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Zeyi Sun
Ye Fang
Tong Wu
Pan Zhang
Yuhang Zang
Shu Kong
Yuanjun Xiong
Dahua Lin
Jiaqi Wang
VLM
CLIP
39
83
0
06 Dec 2023
GPT-4 Enhanced Multimodal Grounding for Autonomous Driving: Leveraging
  Cross-Modal Attention with Large Language Models
GPT-4 Enhanced Multimodal Grounding for Autonomous Driving: Leveraging Cross-Modal Attention with Large Language Models
Haicheng Liao
Huanming Shen
Zhenning Li
Chengyue Wang
Guofa Li
Yiming Bie
Chengzhong Xu
34
50
0
06 Dec 2023
Synchronizing Vision and Language: Bidirectional Token-Masking
  AutoEncoder for Referring Image Segmentation
Synchronizing Vision and Language: Bidirectional Token-Masking AutoEncoder for Referring Image Segmentation
Minhyeok Lee
Dogyoon Lee
Jungho Lee
Suhwan Cho
Heeseung Choi
Ig-Jae Kim
Sangyoun Lee
28
0
0
29 Nov 2023
Continual Referring Expression Comprehension via Dual Modular
  Memorization
Continual Referring Expression Comprehension via Dual Modular Memorization
Hengtao Shen
Cheng Chen
Peng Wang
Lianli Gao
M. Wang
Jingkuan Song
ObjD
25
3
0
25 Nov 2023
TextPSG: Panoptic Scene Graph Generation from Textual Descriptions
TextPSG: Panoptic Scene Graph Generation from Textual Descriptions
Chengyang Zhao
Yikang Shen
Zhenfang Chen
Mingyu Ding
Chuang Gan
48
15
0
10 Oct 2023
Temporal Collection and Distribution for Referring Video Object
  Segmentation
Temporal Collection and Distribution for Referring Video Object Segmentation
Jiajin Tang
Ge Zheng
Sibei Yang
VOS
36
14
0
07 Sep 2023
A Joint Study of Phrase Grounding and Task Performance in Vision and
  Language Models
A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models
Noriyuki Kojima
Hadar Averbuch-Elor
Yoav Artzi
21
2
0
06 Sep 2023
VGDiffZero: Text-to-image Diffusion Models Can Be Zero-shot Visual
  Grounders
VGDiffZero: Text-to-image Diffusion Models Can Be Zero-shot Visual Grounders
Xuyang Liu
Siteng Huang
Yachen Kang
Honggang Chen
Donglin Wang
ObjD
25
12
0
03 Sep 2023
Language-Guided Diffusion Model for Visual Grounding
Language-Guided Diffusion Model for Visual Grounding
Sijia Chen
Baochun Li
27
5
0
18 Aug 2023
MeViS: A Large-scale Benchmark for Video Segmentation with Motion
  Expressions
MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions
Henghui Ding
Chang Liu
Shuting He
Xudong Jiang
Chen Change Loy
VOS
44
101
0
16 Aug 2023
Intelligent Assistant Language Understanding On Device
Intelligent Assistant Language Understanding On Device
Cecilia Aas
Hisham Abdelsalam
Irina Belousova
Shruti Bhargava
Jianpeng Cheng
...
John Torr
Marco Del Vecchio
Jay Wacker
Jason D. Williams
Hong-ye Yu
11
2
0
07 Aug 2023
Learning Implicit Entity-object Relations by Bidirectional Generative
  Alignment for Multimodal NER
Learning Implicit Entity-object Relations by Bidirectional Generative Alignment for Multimodal NER
Feng Chen
Jiajia Liu
Kaixiang Ji
Wang Ren
Jian Wang
Jingdong Wang
19
8
0
03 Aug 2023
Bridging Vision and Language Encoders: Parameter-Efficient Tuning for
  Referring Image Segmentation
Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation
Zunnan Xu
Zhihong Chen
Yong Zhang
Yibing Song
Xiang Wan
Guanbin Li
VLM
35
47
0
21 Jul 2023
Vision Language Transformers: A Survey
Vision Language Transformers: A Survey
Clayton Fields
C. Kennington
VLM
17
5
0
06 Jul 2023
Hierarchical Open-vocabulary Universal Image Segmentation
Hierarchical Open-vocabulary Universal Image Segmentation
Xudong Wang
Shufang Li
Konstantinos Kallidromitis
Yu Kato
Kazuki Kozuka
Trevor Darrell
VLM
OCL
37
36
0
03 Jul 2023
Bidirectional Correlation-Driven Inter-Frame Interaction Transformer for
  Referring Video Object Segmentation
Bidirectional Correlation-Driven Inter-Frame Interaction Transformer for Referring Video Object Segmentation
Meng Lan
Fu Rong
Zuchao Li
Wei Yu
L. Zhang
VOS
29
5
0
02 Jul 2023
Extending CLIP's Image-Text Alignment to Referring Image Segmentation
Extending CLIP's Image-Text Alignment to Referring Image Segmentation
Seoyeon Kim
Minguk Kang
Dongwon Kim
Jaesik Park
Suha Kwak
VLM
27
10
0
14 Jun 2023
PaintSeg: Training-free Segmentation via Painting
PaintSeg: Training-free Segmentation via Painting
Xiang Li
Chung-Ching Lin
Yinpeng Chen
Zicheng Liu
Jinglu Wang
Bhiksha Raj
34
5
0
30 May 2023
GeoVLN: Learning Geometry-Enhanced Visual Representation with Slot
  Attention for Vision-and-Language Navigation
GeoVLN: Learning Geometry-Enhanced Visual Representation with Slot Attention for Vision-and-Language Navigation
Jingyang Huo
Qiang Sun
Boyan Jiang
Haitao Lin
Yanwei Fu
36
19
0
26 May 2023
Referred by Multi-Modality: A Unified Temporal Transformer for Video
  Object Segmentation
Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation
Shilin Yan
Renrui Zhang
Ziyu Guo
Wenchao Chen
Wei Zhang
Hongyang Li
Yu Qiao
Hao Dong
Zhongjiang He
Peng Gao
VOS
20
30
0
25 May 2023
Multi-Modal Mutual Attention and Iterative Interaction for Referring
  Image Segmentation
Multi-Modal Mutual Attention and Iterative Interaction for Referring Image Segmentation
Chang Liu
Henghui Ding
Yulun Zhang
Xudong Jiang
24
47
0
24 May 2023
Advancing Referring Expression Segmentation Beyond Single Image
Advancing Referring Expression Segmentation Beyond Single Image
YiXuan Wu
Zhao Zhang
Xie Chi
Feng Zhu
Rui Zhao
VLM
34
18
0
21 May 2023
TreePrompt: Learning to Compose Tree Prompts for Explainable Visual
  Grounding
TreePrompt: Learning to Compose Tree Prompts for Explainable Visual Grounding
Chenchi Zhang
Jun Xiao
Lei Chen
Jian Shao
Long Chen
VLM
LRM
24
2
0
19 May 2023
Locate Then Generate: Bridging Vision and Language with Bounding Box for
  Scene-Text VQA
Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA
Yongxin Zhu
Z. Liu
Yukang Liang
Xin Li
Hao Liu
Changcun Bao
Linli Xu
21
6
0
04 Apr 2023
A Region-Prompted Adapter Tuning for Visual Abductive Reasoning
A Region-Prompted Adapter Tuning for Visual Abductive Reasoning
Hao Zhang
Yeo Keat Ee
Basura Fernando
VLM
27
3
0
18 Mar 2023
Semantics-Aware Dynamic Localization and Refinement for Referring Image
  Segmentation
Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation
Zhao Yang
Jiaqi Wang
Yansong Tang
Kai-xiang Chen
Hengshuang Zhao
Philip H. S. Torr
36
23
0
11 Mar 2023
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set
  Object Detection
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
...
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
ObjD
80
1,810
0
09 Mar 2023
1234
Next