Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1511.02283
Cited By
v1
v2
v3 (latest)
Generation and Comprehension of Unambiguous Object Descriptions
7 November 2015
Junhua Mao
Jonathan Huang
Alexander Toshev
Oana-Maria Camburu
Alan Yuille
Kevin Patrick Murphy
ObjD
Re-assign community
ArXiv (abs)
PDF
HTML
Github (164★)
Papers citing
"Generation and Comprehension of Unambiguous Object Descriptions"
50 / 914 papers shown
Title
LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Models for Referring Expression Comprehension
International Conference on Learning Representations (ICLR), 2024
Amaia Cardiel
Éloi Zablocki
Oriane Siméoni
Elias Ramzi
Matthieu Cord
VLM
232
0
0
18 Sep 2024
Referring Expression Generation in Visually Grounded Dialogue with Discourse-aware Comprehension Guiding
International Conference on Natural Language Generation (INLG), 2024
Bram Willemsen
Gabriel Skantze
198
1
0
09 Sep 2024
Make Graph-based Referring Expression Comprehension Great Again through Expression-guided Dynamic Gating and Regression
IEEE transactions on multimedia (IEEE TMM), 2024
Jingcheng Ke
Dele Wang
Jun-Cheng Chen
I-Hong Jhuo
Chia-Wen Lin
Yen-Yu Lin
206
1
0
05 Sep 2024
SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation
European Conference on Computer Vision (ECCV), 2024
Yi-Chia Chen
Wei-Hua Li
Cheng Sun
Yu-Chiang Frank Wang
Chu-Song Chen
VLM
208
52
0
01 Sep 2024
ResVG: Enhancing Relation and Semantic Understanding in Multiple Instances for Visual Grounding
ACM Multimedia (MM), 2024
Minghang Zheng
Jiahua Zhang
Qingchao Chen
Yuxin Peng
Yang Liu
ObjD
254
5
0
29 Aug 2024
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation
International Conference on Learning Representations (ICLR), 2024
Fangxun Shu
Yue Liao
Le Zhuo
Chenning Xu
Guanghao Zhang
...
Bolin Li
Zhelun Yu
Si Liu
Hongsheng Li
Hao Jiang
VLM
MoE
178
30
0
28 Aug 2024
Pixels to Prose: Understanding the art of Image Captioning
Hrishikesh Singh
Aarti Sharma
Millie Pant
3DV
VLM
186
2
0
28 Aug 2024
Depth-Weighted Detection of Behaviours of Risk in People with Dementia using Cameras
Pratik K. Mishra
Irene Ballester
Andrea Iaboni
Bing Ye
Kristine Newman
Alex Mihailidis
Shehroz S. Khan
185
2
0
28 Aug 2024
IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities
AAAI Conference on Artificial Intelligence (AAAI), 2024
Bin Wang
Chunyu Xie
Dawei Leng
Yuhui Yin
MLLM
401
5
0
23 Aug 2024
Visual Agents as Fast and Slow Thinkers
International Conference on Learning Representations (ICLR), 2024
Guangyan Sun
Haoyang Ling
Zhenting Wang
Cheng-Long Wang
Siqi Ma
Qifan Wang
Ying Nian Wu
Ying Nian Wu
Dongfang Liu
Dongfang Liu
LLMAG
LRM
426
41
0
16 Aug 2024
Towards Flexible Visual Relationship Segmentation
Neural Information Processing Systems (NeurIPS), 2024
Fangrui Zhu
Jianwei Yang
Huaizu Jiang
VOS
256
4
0
15 Aug 2024
Cross-aware Early Fusion with Stage-divided Vision and Language Transformer Encoders for Referring Image Segmentation
IEEE transactions on multimedia (IEEE TMM), 2024
Yubin Cho
Hyunwoo Yu
Suk-Ju Kang
209
32
0
14 Aug 2024
Revisiting Multi-Modal LLM Evaluation
Jian Lu
Shikhar Srivastava
Junyu Chen
Robik Shrestha
Manoj Acharya
Kushal Kafle
Christopher Kanan
133
5
0
09 Aug 2024
How Well Can Vision Language Models See Image Details?
Chenhui Gou
Abdulwahab Felemban
Faizan Farooq Khan
Deyao Zhu
Jianfei Cai
Hamid Rezatofighi
Mohamed Elhoseiny
VLM
MLLM
208
12
0
07 Aug 2024
One Framework to Rule Them All: Unifying Multimodal Tasks with LLM Neural-Tuning
Pattern Recognition (Pattern Recogn.), 2024
Hao Sun
Yu Song
Jiaqing Liu
Jihong Hu
Yen-Wei Chen
Lanfen Lin
VLM
218
0
0
06 Aug 2024
UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Zhaowei Li
Wei Wang
Yiqing Cai
Xu Qi
Pengyu Wang
Dong Zhang
Hang Song
Botian Jiang
Zhida Huang
Tao Wang
AIFin
LRM
192
7
0
05 Aug 2024
A Novel Evaluation Framework for Image2Text Generation
Jia-Hong Huang
Hongyi Zhu
Yixian Shen
Stevan Rudinac
A. M. Pacces
Evangelos Kanoulas
177
11
0
03 Aug 2024
SynopGround: A Large-Scale Dataset for Multi-Paragraph Video Grounding from TV Dramas and Synopses
ACM Multimedia (MM), 2024
Chaolei Tan
Zihang Lin
Junfu Pu
Chen Ma
Wei-Yi Pei
Zhi Qu
Yexin Wang
Ying Shan
Wei-Shi Zheng
Jianfang Hu
AI4TS
295
2
0
03 Aug 2024
An Efficient and Effective Transformer Decoder-Based Framework for Multi-Task Visual Grounding
European Conference on Computer Vision (ECCV), 2024
Wei Chen
Mahdieh Hatamian
Yu Wu
198
16
0
02 Aug 2024
Add-SD: Rational Generation without Manual Reference
Lingfeng Yang
Xinyu Zhang
Xiang Li
Jinwen Chen
Kun Yao
Qiang Chen
Errui Ding
Ling-Ling Liu
Jingdong Wang
Jian Yang
137
1
0
30 Jul 2024
3D-GRES: Generalized 3D Referring Expression Segmentation
Changli Wu
Yihang Liu
Jiayi Ji
Yiwei Ma
Haowei Wang
Gen Luo
Henghui Ding
Xiaoshuai Sun
Rongrong Ji
199
15
0
30 Jul 2024
Look Hear: Gaze Prediction for Speech-directed Human Attention
European Conference on Computer Vision (ECCV), 2024
Sounak Mondal
Seoyoung Ahn
Zhibo Yang
Niranjan Balasubramanian
Dimitris Samaras
G. Zelinsky
Minh Hoai
361
3
0
28 Jul 2024
Every Part Matters: Integrity Verification of Scientific Figures Based on Multimodal Large Language Models
Xiang Shi
Jiawei Liu
Yinpeng Liu
Qikai Cheng
Wei Lu
141
0
0
26 Jul 2024
RefMask3D: Language-Guided Transformer for 3D Referring Segmentation
Shuting He
Henghui Ding
211
18
0
25 Jul 2024
PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects
Junyi Li
Junfeng Wu
Weizhi Zhao
Song Bai
Xiang Bai
168
13
0
23 Jul 2024
Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight
Ziyuan Huang
Kaixiang Ji
Biao Gong
Zhiwu Qing
Qinglong Zhang
Kecheng Zheng
Jian Wang
Jingdong Chen
Ming Yang
LRM
174
5
0
22 Jul 2024
Learning Visual Grounding from Generative Vision and Language Model
Shijie Wang
Dahun Kim
A. Taalimi
Chen Sun
Weicheng Kuo
ObjD
225
17
0
18 Jul 2024
MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models
Leyang Shen
Gongwei Chen
Rui Shao
Weili Guan
Liqiang Nie
MoE
137
33
0
17 Jul 2024
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
Kaichen Zhang
Bo Li
Peiyuan Zhang
Fanyi Pu
Joshua Adrian Cahyono
...
Shuai Liu
Yuanhan Zhang
Jingkang Yang
Chunyuan Li
Ziwei Liu
416
186
0
17 Jul 2024
VISA: Reasoning Video Object Segmentation via Large Language Models
Cilin Yan
Haochen Wang
Shilin Yan
Xiaolong Jiang
Yao Hu
Guoliang Kang
Weidi Xie
E. Gavves
LRM
VLM
VOS
208
90
0
16 Jul 2024
LiteGPT: Large Vision-Language Model for Joint Chest X-ray Localization and Classification Task
Khai-Nguyen Nguyen
Ryan Zhang
Ngoc Son Nguyen
Tan-Hanh Pham
Anh Dao
Ba Hung Ngo
Anh Totti Nguyen
Truong-Son Hy
MedIm
LM&MA
167
5
0
16 Jul 2024
HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models
Runhui Huang
Xinpeng Ding
Chunwei Wang
J. N. Han
Yulong Liu
Hengshuang Zhao
Hang Xu
Lu Hou
Wei Zhang
Xiaodan Liang
VLM
168
13
0
11 Jul 2024
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning
Haiwen Diao
Bo Wan
Xu Jia
Yunzhi Zhuge
Ying Zhang
Huchuan Lu
Long Chen
VLM
211
10
0
10 Jul 2024
Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation
Seonghoon Yu
Paul Hongsuck Seo
Jeany Son
DiffM
328
11
0
10 Jul 2024
ActionVOS: Actions as Prompts for Video Object Segmentation
Liangyang Ouyang
Ruicong Liu
Yifei Huang
Ryosuke Furuta
Yoichi Sato
VOS
159
7
0
10 Jul 2024
FALIP: Visual Prompt as Foveal Attention Boosts CLIP Zero-Shot Performance
Jiedong Zhuang
Jiaqi Hu
Lianrui Mu
Rui Hu
Xiaoyu Liang
Jiangnan Ye
Haoji Hu
CLIP
VLM
266
7
0
08 Jul 2024
Cognitive Modeling with Scaffolded LLMs: A Case Study of Referential Expression Generation
Polina Tsvilodub
Michael Franke
Fausto Carcassi
137
2
0
04 Jul 2024
ACTRESS: Active Retraining for Semi-supervised Visual Grounding
Weitai Kang
Mengxue Qu
Yunchao Wei
Yan Yan
270
8
0
03 Jul 2024
Visual Grounding with Attention-Driven Constraint Balancing
Weitai Kang
Luowei Zhou
Junyi Wu
Changchang Sun
Yan Yan
205
9
0
03 Jul 2024
SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding
Weitai Kang
Gaowen Liu
Mubarak Shah
Yan Yan
ObjD
316
17
0
03 Jul 2024
SafaRi:Adaptive Sequence Transformer for Weakly Supervised Referring Expression Segmentation
Sayan Nag
Koustava Goswami
Srikrishna Karanam
242
6
0
02 Jul 2024
CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation
Yuxuan Wang
Yijun Liu
Fei Yu
Chen Huang
Kexin Li
Zhiguo Wan
Wanxiang Che
VLM
CoGe
129
7
0
01 Jul 2024
Object Segmentation from Open-Vocabulary Manipulation Instructions Based on Optimal Transport Polygon Matching with Multimodal Foundation Models
Takayuki Nishimura
Katsuyuki Kuyo
Motonari Kambara
Komei Sugiura
DiffM
224
1
0
01 Jul 2024
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model
Yuxuan Zhang
Tianheng Cheng
Lianghui Zhu
Lei Liu
Heng Liu
Longjin Ran
Xiaoxin Chen
Xiaoxin Chen
Wenyu Liu
Xinggang Wang
VLM
487
53
0
28 Jun 2024
ScanFormer: Referring Expression Comprehension by Iteratively Scanning
Wei Su
Peihan Miao
Huanzhang Dou
Xi Li
ObjD
227
15
0
26 Jun 2024
Revisiting Referring Expression Comprehension Evaluation in the Era of Large Multimodal Models
Jierun Chen
Fangyun Wei
Jinjing Zhao
Sizhe Song
Bohuai Wu
Zhuoxuan Peng
S.-H. Gary Chan
Hongyang R. Zhang
221
30
0
24 Jun 2024
PVUW 2024 Challenge on Complex Video Understanding: Methods and Results
Henghui Ding
Chang Liu
Yunchao Wei
Nikhila Ravi
Shuting He
...
Bo Zhao
Jing Liu
Feiyu Pan
Hao Fang
Xiankai Lu
201
10
0
24 Jun 2024
Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models?
Gregor Geigle
Radu Timofte
Goran Glavaš
219
2
0
20 Jun 2024
GroPrompt: Efficient Grounded Prompting and Adaptation for Referring Video Object Segmentation
Ci-Siang Lin
I-Jieh Liu
Min-Hung Chen
Chien-Yi Wang
Sifei Liu
Yu-Chiang Frank Wang
VOS
191
1
0
18 Jun 2024
Unveiling Encoder-Free Vision-Language Models
Haiwen Diao
Yufeng Cui
Xiaotong Li
Yueze Wang
Huchuan Lu
Xinlong Wang
VLM
198
63
0
17 Jun 2024
Previous
1
2
3
4
5
6
...
17
18
19
Next