ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1511.02283
  4. Cited By
Generation and Comprehension of Unambiguous Object Descriptions
v1v2v3 (latest)

Generation and Comprehension of Unambiguous Object Descriptions

7 November 2015
Junhua Mao
Jonathan Huang
Alexander Toshev
Oana-Maria Camburu
Alan Yuille
Kevin Patrick Murphy
    ObjD
ArXiv (abs)PDFHTMLGithub (164★)

Papers citing "Generation and Comprehension of Unambiguous Object Descriptions"

50 / 914 papers shown
Title
LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Models for Referring Expression Comprehension
LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Models for Referring Expression ComprehensionInternational Conference on Learning Representations (ICLR), 2024
Amaia Cardiel
Éloi Zablocki
Oriane Siméoni
Elias Ramzi
Matthieu Cord
VLM
232
0
0
18 Sep 2024
Referring Expression Generation in Visually Grounded Dialogue with
  Discourse-aware Comprehension Guiding
Referring Expression Generation in Visually Grounded Dialogue with Discourse-aware Comprehension GuidingInternational Conference on Natural Language Generation (INLG), 2024
Bram Willemsen
Gabriel Skantze
198
1
0
09 Sep 2024
Make Graph-based Referring Expression Comprehension Great Again through
  Expression-guided Dynamic Gating and Regression
Make Graph-based Referring Expression Comprehension Great Again through Expression-guided Dynamic Gating and RegressionIEEE transactions on multimedia (IEEE TMM), 2024
Jingcheng Ke
Dele Wang
Jun-Cheng Chen
I-Hong Jhuo
Chia-Wen Lin
Yen-Yu Lin
206
1
0
05 Sep 2024
SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring
  Expression Segmentation
SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression SegmentationEuropean Conference on Computer Vision (ECCV), 2024
Yi-Chia Chen
Wei-Hua Li
Cheng Sun
Yu-Chiang Frank Wang
Chu-Song Chen
VLM
208
52
0
01 Sep 2024
ResVG: Enhancing Relation and Semantic Understanding in Multiple
  Instances for Visual Grounding
ResVG: Enhancing Relation and Semantic Understanding in Multiple Instances for Visual GroundingACM Multimedia (MM), 2024
Minghang Zheng
Jiahua Zhang
Qingchao Chen
Yuxin Peng
Yang Liu
ObjD
254
5
0
29 Aug 2024
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge DistillationInternational Conference on Learning Representations (ICLR), 2024
Fangxun Shu
Yue Liao
Le Zhuo
Chenning Xu
Guanghao Zhang
...
Bolin Li
Zhelun Yu
Si Liu
Hongsheng Li
Hao Jiang
VLMMoE
178
30
0
28 Aug 2024
Pixels to Prose: Understanding the art of Image Captioning
Pixels to Prose: Understanding the art of Image Captioning
Hrishikesh Singh
Aarti Sharma
Millie Pant
3DVVLM
186
2
0
28 Aug 2024
Depth-Weighted Detection of Behaviours of Risk in People with Dementia using Cameras
Depth-Weighted Detection of Behaviours of Risk in People with Dementia using Cameras
Pratik K. Mishra
Irene Ballester
Andrea Iaboni
Bing Ye
Kristine Newman
Alex Mihailidis
Shehroz S. Khan
185
2
0
28 Aug 2024
IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities
IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal CapabilitiesAAAI Conference on Artificial Intelligence (AAAI), 2024
Bin Wang
Chunyu Xie
Dawei Leng
Yuhui Yin
MLLM
401
5
0
23 Aug 2024
Visual Agents as Fast and Slow Thinkers
Visual Agents as Fast and Slow ThinkersInternational Conference on Learning Representations (ICLR), 2024
Guangyan Sun
Haoyang Ling
Zhenting Wang
Cheng-Long Wang
Siqi Ma
Qifan Wang
Ying Nian Wu
Ying Nian Wu
Dongfang Liu
Dongfang Liu
LLMAGLRM
426
41
0
16 Aug 2024
Towards Flexible Visual Relationship Segmentation
Towards Flexible Visual Relationship SegmentationNeural Information Processing Systems (NeurIPS), 2024
Fangrui Zhu
Jianwei Yang
Huaizu Jiang
VOS
256
4
0
15 Aug 2024
Cross-aware Early Fusion with Stage-divided Vision and Language
  Transformer Encoders for Referring Image Segmentation
Cross-aware Early Fusion with Stage-divided Vision and Language Transformer Encoders for Referring Image SegmentationIEEE transactions on multimedia (IEEE TMM), 2024
Yubin Cho
Hyunwoo Yu
Suk-Ju Kang
209
32
0
14 Aug 2024
Revisiting Multi-Modal LLM Evaluation
Revisiting Multi-Modal LLM Evaluation
Jian Lu
Shikhar Srivastava
Junyu Chen
Robik Shrestha
Manoj Acharya
Kushal Kafle
Christopher Kanan
133
5
0
09 Aug 2024
How Well Can Vision Language Models See Image Details?
How Well Can Vision Language Models See Image Details?
Chenhui Gou
Abdulwahab Felemban
Faizan Farooq Khan
Deyao Zhu
Jianfei Cai
Hamid Rezatofighi
Mohamed Elhoseiny
VLMMLLM
208
12
0
07 Aug 2024
One Framework to Rule Them All: Unifying Multimodal Tasks with LLM Neural-Tuning
One Framework to Rule Them All: Unifying Multimodal Tasks with LLM Neural-TuningPattern Recognition (Pattern Recogn.), 2024
Hao Sun
Yu Song
Jiaqing Liu
Jihong Hu
Yen-Wei Chen
Lanfen Lin
VLM
218
0
0
06 Aug 2024
UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks
  With Large Language Model
UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language ModelNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Zhaowei Li
Wei Wang
Yiqing Cai
Xu Qi
Pengyu Wang
Dong Zhang
Hang Song
Botian Jiang
Zhida Huang
Tao Wang
AIFinLRM
192
7
0
05 Aug 2024
A Novel Evaluation Framework for Image2Text Generation
A Novel Evaluation Framework for Image2Text Generation
Jia-Hong Huang
Hongyi Zhu
Yixian Shen
Stevan Rudinac
A. M. Pacces
Evangelos Kanoulas
177
11
0
03 Aug 2024
SynopGround: A Large-Scale Dataset for Multi-Paragraph Video Grounding
  from TV Dramas and Synopses
SynopGround: A Large-Scale Dataset for Multi-Paragraph Video Grounding from TV Dramas and SynopsesACM Multimedia (MM), 2024
Chaolei Tan
Zihang Lin
Junfu Pu
Chen Ma
Wei-Yi Pei
Zhi Qu
Yexin Wang
Ying Shan
Wei-Shi Zheng
Jianfang Hu
AI4TS
295
2
0
03 Aug 2024
An Efficient and Effective Transformer Decoder-Based Framework for
  Multi-Task Visual Grounding
An Efficient and Effective Transformer Decoder-Based Framework for Multi-Task Visual GroundingEuropean Conference on Computer Vision (ECCV), 2024
Wei Chen
Mahdieh Hatamian
Yu Wu
198
16
0
02 Aug 2024
Add-SD: Rational Generation without Manual Reference
Add-SD: Rational Generation without Manual Reference
Lingfeng Yang
Xinyu Zhang
Xiang Li
Jinwen Chen
Kun Yao
Qiang Chen
Errui Ding
Ling-Ling Liu
Jingdong Wang
Jian Yang
137
1
0
30 Jul 2024
3D-GRES: Generalized 3D Referring Expression Segmentation
3D-GRES: Generalized 3D Referring Expression Segmentation
Changli Wu
Yihang Liu
Jiayi Ji
Yiwei Ma
Haowei Wang
Gen Luo
Henghui Ding
Xiaoshuai Sun
Rongrong Ji
199
15
0
30 Jul 2024
Look Hear: Gaze Prediction for Speech-directed Human Attention
Look Hear: Gaze Prediction for Speech-directed Human AttentionEuropean Conference on Computer Vision (ECCV), 2024
Sounak Mondal
Seoyoung Ahn
Zhibo Yang
Niranjan Balasubramanian
Dimitris Samaras
G. Zelinsky
Minh Hoai
361
3
0
28 Jul 2024
Every Part Matters: Integrity Verification of Scientific Figures Based
  on Multimodal Large Language Models
Every Part Matters: Integrity Verification of Scientific Figures Based on Multimodal Large Language Models
Xiang Shi
Jiawei Liu
Yinpeng Liu
Qikai Cheng
Wei Lu
141
0
0
26 Jul 2024
RefMask3D: Language-Guided Transformer for 3D Referring Segmentation
RefMask3D: Language-Guided Transformer for 3D Referring Segmentation
Shuting He
Henghui Ding
211
18
0
25 Jul 2024
PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects
PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects
Junyi Li
Junfeng Wu
Weizhi Zhao
Song Bai
Xiang Bai
168
13
0
23 Jul 2024
Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight
Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight
Ziyuan Huang
Kaixiang Ji
Biao Gong
Zhiwu Qing
Qinglong Zhang
Kecheng Zheng
Jian Wang
Jingdong Chen
Ming Yang
LRM
174
5
0
22 Jul 2024
Learning Visual Grounding from Generative Vision and Language Model
Learning Visual Grounding from Generative Vision and Language Model
Shijie Wang
Dahun Kim
A. Taalimi
Chen Sun
Weicheng Kuo
ObjD
225
17
0
18 Jul 2024
MoME: Mixture of Multimodal Experts for Generalist Multimodal Large
  Language Models
MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models
Leyang Shen
Gongwei Chen
Rui Shao
Weili Guan
Liqiang Nie
MoE
137
33
0
17 Jul 2024
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
Kaichen Zhang
Bo Li
Peiyuan Zhang
Fanyi Pu
Joshua Adrian Cahyono
...
Shuai Liu
Yuanhan Zhang
Jingkang Yang
Chunyuan Li
Ziwei Liu
416
186
0
17 Jul 2024
VISA: Reasoning Video Object Segmentation via Large Language Models
VISA: Reasoning Video Object Segmentation via Large Language Models
Cilin Yan
Haochen Wang
Shilin Yan
Xiaolong Jiang
Yao Hu
Guoliang Kang
Weidi Xie
E. Gavves
LRMVLMVOS
208
90
0
16 Jul 2024
LiteGPT: Large Vision-Language Model for Joint Chest X-ray Localization
  and Classification Task
LiteGPT: Large Vision-Language Model for Joint Chest X-ray Localization and Classification Task
Khai-Nguyen Nguyen
Ryan Zhang
Ngoc Son Nguyen
Tan-Hanh Pham
Anh Dao
Ba Hung Ngo
Anh Totti Nguyen
Truong-Son Hy
MedImLM&MA
167
5
0
16 Jul 2024
HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large
  Vision-Language Models
HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models
Runhui Huang
Xinpeng Ding
Chunwei Wang
J. N. Han
Yulong Liu
Hengshuang Zhao
Hang Xu
Lu Hou
Wei Zhang
Xiaodan Liang
VLM
168
13
0
11 Jul 2024
SHERL: Synthesizing High Accuracy and Efficient Memory for
  Resource-Limited Transfer Learning
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning
Haiwen Diao
Bo Wan
Xu Jia
Yunzhi Zhuge
Ying Zhang
Huchuan Lu
Long Chen
VLM
211
10
0
10 Jul 2024
Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring
  Image Segmentation
Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation
Seonghoon Yu
Paul Hongsuck Seo
Jeany Son
DiffM
328
11
0
10 Jul 2024
ActionVOS: Actions as Prompts for Video Object Segmentation
ActionVOS: Actions as Prompts for Video Object Segmentation
Liangyang Ouyang
Ruicong Liu
Yifei Huang
Ryosuke Furuta
Yoichi Sato
VOS
159
7
0
10 Jul 2024
FALIP: Visual Prompt as Foveal Attention Boosts CLIP Zero-Shot
  Performance
FALIP: Visual Prompt as Foveal Attention Boosts CLIP Zero-Shot Performance
Jiedong Zhuang
Jiaqi Hu
Lianrui Mu
Rui Hu
Xiaoyu Liang
Jiangnan Ye
Haoji Hu
CLIPVLM
266
7
0
08 Jul 2024
Cognitive Modeling with Scaffolded LLMs: A Case Study of Referential
  Expression Generation
Cognitive Modeling with Scaffolded LLMs: A Case Study of Referential Expression Generation
Polina Tsvilodub
Michael Franke
Fausto Carcassi
137
2
0
04 Jul 2024
ACTRESS: Active Retraining for Semi-supervised Visual Grounding
ACTRESS: Active Retraining for Semi-supervised Visual Grounding
Weitai Kang
Mengxue Qu
Yunchao Wei
Yan Yan
270
8
0
03 Jul 2024
Visual Grounding with Attention-Driven Constraint Balancing
Visual Grounding with Attention-Driven Constraint Balancing
Weitai Kang
Luowei Zhou
Junyi Wu
Changchang Sun
Yan Yan
205
9
0
03 Jul 2024
SegVG: Transferring Object Bounding Box to Segmentation for Visual
  Grounding
SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding
Weitai Kang
Gaowen Liu
Mubarak Shah
Yan Yan
ObjD
316
17
0
03 Jul 2024
SafaRi:Adaptive Sequence Transformer for Weakly Supervised Referring
  Expression Segmentation
SafaRi:Adaptive Sequence Transformer for Weakly Supervised Referring Expression Segmentation
Sayan Nag
Koustava Goswami
Srikrishna Karanam
242
6
0
02 Jul 2024
CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding
  Evaluation
CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation
Yuxuan Wang
Yijun Liu
Fei Yu
Chen Huang
Kexin Li
Zhiguo Wan
Wanxiang Che
VLMCoGe
129
7
0
01 Jul 2024
Object Segmentation from Open-Vocabulary Manipulation Instructions Based
  on Optimal Transport Polygon Matching with Multimodal Foundation Models
Object Segmentation from Open-Vocabulary Manipulation Instructions Based on Optimal Transport Polygon Matching with Multimodal Foundation Models
Takayuki Nishimura
Katsuyuki Kuyo
Motonari Kambara
Komei Sugiura
DiffM
224
1
0
01 Jul 2024
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model
Yuxuan Zhang
Tianheng Cheng
Lianghui Zhu
Lei Liu
Heng Liu
Longjin Ran
Xiaoxin Chen
Xiaoxin Chen
Wenyu Liu
Xinggang Wang
VLM
487
53
0
28 Jun 2024
ScanFormer: Referring Expression Comprehension by Iteratively Scanning
ScanFormer: Referring Expression Comprehension by Iteratively Scanning
Wei Su
Peihan Miao
Huanzhang Dou
Xi Li
ObjD
227
15
0
26 Jun 2024
Revisiting Referring Expression Comprehension Evaluation in the Era of
  Large Multimodal Models
Revisiting Referring Expression Comprehension Evaluation in the Era of Large Multimodal Models
Jierun Chen
Fangyun Wei
Jinjing Zhao
Sizhe Song
Bohuai Wu
Zhuoxuan Peng
S.-H. Gary Chan
Hongyang R. Zhang
221
30
0
24 Jun 2024
PVUW 2024 Challenge on Complex Video Understanding: Methods and Results
PVUW 2024 Challenge on Complex Video Understanding: Methods and Results
Henghui Ding
Chang Liu
Yunchao Wei
Nikhila Ravi
Shuting He
...
Bo Zhao
Jing Liu
Feiyu Pan
Hao Fang
Xiankai Lu
201
10
0
24 Jun 2024
Does Object Grounding Really Reduce Hallucination of Large
  Vision-Language Models?
Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models?
Gregor Geigle
Radu Timofte
Goran Glavaš
219
2
0
20 Jun 2024
GroPrompt: Efficient Grounded Prompting and Adaptation for Referring
  Video Object Segmentation
GroPrompt: Efficient Grounded Prompting and Adaptation for Referring Video Object Segmentation
Ci-Siang Lin
I-Jieh Liu
Min-Hung Chen
Chien-Yi Wang
Sifei Liu
Yu-Chiang Frank Wang
VOS
191
1
0
18 Jun 2024
Unveiling Encoder-Free Vision-Language Models
Unveiling Encoder-Free Vision-Language Models
Haiwen Diao
Yufeng Cui
Xiaotong Li
Yueze Wang
Huchuan Lu
Xinlong Wang
VLM
198
63
0
17 Jun 2024
Previous
123456...171819
Next