Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2111.15174
Cited By
v1
v2 (latest)
CRIS: CLIP-Driven Referring Image Segmentation
30 November 2021
Zhaoqing Wang
Yu Lu
Qiang Li
Xunqiang Tao
Yan Guo
Ming Gong
Tongliang Liu
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"CRIS: CLIP-Driven Referring Image Segmentation"
50 / 288 papers shown
IteRPrimE: Zero-shot Referring Image Segmentation with Iterative Grad-CAM Refinement and Primary Word Emphasis
AAAI Conference on Artificial Intelligence (AAAI), 2025
Yun Wang
Jingchen Ni
Yong-Jin Liu
Chun Yuan
Yansong Tang
292
15
0
02 Mar 2025
Grad-ECLIP: Gradient-based Visual and Textual Explanations for CLIP
Chenyang Zhao
Kun Wang
J. H. Hsiao
Antoni B. Chan
CLIP
267
7
0
26 Feb 2025
A Survey on Foundation-Model-Based Industrial Defect Detection
Tianle Yang
Luyao Chang
Jiadong Yan
Jiajian Li
Zhi Wang
Ke Zhang
AI4CE
515
6
0
26 Feb 2025
Pixel-Level Reasoning Segmentation via Multi-turn Conversations
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Dexian Cai
Xiaocui Yang
Yongkang Liu
Daling Wang
Shi Feng
Yifei Zhang
Soujanya Poria
LRM
335
3
0
13 Feb 2025
SIREN: Semantic, Initialization-Free Registration of Multi-Robot Gaussian Splatting Maps
Ola Shorinwa
Jiankai Sun
Mac Schwager
Anirudha Majumdar
3DGS
381
9
0
10 Feb 2025
Efficient Redundancy Reduction for Open-Vocabulary Semantic Segmentation
Lin Chen
Qi Yang
Kun Ding
Tianying Wang
Gang Shen
Fei Li
Qiyuan Cao
Shiming Xiang
VLM
207
2
0
29 Jan 2025
Dynamic Scene Understanding from Vision-Language Representations
Shahaf Pruss
Morris Alper
Hadar Averbuch-Elor
OCL
1.1K
1
0
20 Jan 2025
A Comprehensive Survey of Foundation Models in Medicine
IEEE Reviews in Biomedical Engineering (RBME), 2024
Wasif Khan
Seowung Leem
Kyle B. See
Joshua K. Wong
Shaoting Zhang
R. Fang
AI4CE
LM&MA
VLM
767
71
0
17 Jan 2025
The Devil is in Temporal Token: High Quality Video Reasoning Segmentation
Computer Vision and Pattern Recognition (CVPR), 2025
Sitong Gong
Yunzhi Zhuge
Lu Zhang
Zhiyong Yang
Pingping Zhang
Huchuan Lu
245
17
0
15 Jan 2025
Continual Test-Time Adaptation for Single Image Defocus Deblurring via Causal Siamese Networks
International Journal of Computer Vision (IJCV), 2025
Shuang Cui
Yi Li
Jiangmeng Li
Xiongxin Tang
Fuchun Sun
Jianwei Niu
Hui Xiong
292
1
0
15 Jan 2025
Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints
AAAI Conference on Artificial Intelligence (AAAI), 2025
Ming Dai
Jian Li
Jiedong Zhuang
Xian Zhang
Wankou Yang
ObjD
369
13
0
12 Jan 2025
Hierarchical Alignment-enhanced Adaptive Grounding Network for Generalized Referring Expression Comprehension
AAAI Conference on Artificial Intelligence (AAAI), 2025
Yaxian Wang
Henghui Ding
Shuting He
Xudong Jiang
Bifan Wei
Jun Liu
ObjD
261
8
0
03 Jan 2025
Towards Visual Grounding: A Survey
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Linhui Xiao
Xiaoshan Yang
X. Lan
Yaowei Wang
Changsheng Xu
ObjD
964
31
0
28 Dec 2024
Cross-Modal Few-Shot Learning with Second-Order Neural Ordinary Differential Equations
AAAI Conference on Artificial Intelligence (AAAI), 2024
Yi Zhang
Chun-Wun Cheng
Junyi He
Zhihai He
Carola-Bibiane Schonlieb
Yuyan Chen
Angelica I Aviles-Rivero
AI4TS
321
0
0
20 Dec 2024
InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models
Cong Wei
Yujie Zhong
Haoxian Tan
Yingsen Zeng
Yong Liu
Zheng Zhao
Yujiu Yang
MLLM
VLM
VOS
287
11
0
18 Dec 2024
Unlocking Visual Secrets: Inverting Features with Diffusion Priors for Image Reconstruction
Sai Qian Zhang
Ziyun Li
Chuan Guo
Saeed Mahloujifar
Deeksha Dangwal
Edward Suh
B. D. Salvo
Chiao Liu
DiffM
310
2
0
11 Dec 2024
HyperSeg: Towards Universal Visual Segmentation with Large Language Model
Cong Wei
Yujie Zhong
Haoxian Tan
Yong Liu
Zheng Zhao
Jie Hu
Yujiu Yang
VOS
MLLM
VLM
LRM
274
18
0
26 Nov 2024
LaVin-DiT: Large Vision Diffusion Transformer
Computer Vision and Pattern Recognition (CVPR), 2024
Zhaoqing Wang
Xiaobo Xia
Runnan Chen
Dongdong Yu
Changhu Wang
Mingming Gong
Tongliang Liu
553
19
0
18 Nov 2024
Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation
European Conference on Computer Vision (ECCV), 2024
Seongsu Ha
Chaeyun Kim
Donghwa Kim
Junho Lee
Sangho Lee
Joonseok Lee
258
6
0
03 Nov 2024
On Occlusions in Video Action Detection: Benchmark Datasets And Training Recipes
Neural Information Processing Systems (NeurIPS), 2024
Rajat Modi
Vibhav Vineet
Yogesh S Rawat
329
3
0
25 Oct 2024
CLIP-VAD: Exploiting Vision-Language Models for Voice Activity Detection
Andrea Appiani
Cigdem Beyan
CLIP
VLM
297
2
0
18 Oct 2024
LESS: Label-Efficient and Single-Stage Referring 3D Segmentation
Neural Information Processing Systems (NeurIPS), 2024
Xuexun Liu
Xiaoxu Xu
Jinlong Li
Qiudan Zhang
Xu Wang
Andrii Zadaianchuk
Lin Ma
356
3
0
17 Oct 2024
A Survey of Low-shot Vision-Language Model Adaptation via Representer Theorem
Kun Ding
Ying Wang
Gaofeng Meng
Shiming Xiang
VLM
285
0
0
15 Oct 2024
Calibrated Cache Model for Few-Shot Vision-Language Model Adaptation
Kun Ding
Qiang Yu
Haojian Zhang
Gaofeng Meng
Shiming Xiang
VLM
198
2
0
11 Oct 2024
Cross-Modal Bidirectional Interaction Model for Referring Remote Sensing Image Segmentation
Zhe Dong
Yuzhe Sun
Tianzhu Liu
Wangmeng Zuo
Yanfeng Gu
398
20
0
11 Oct 2024
OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling
Neural Information Processing Systems (NeurIPS), 2024
Linhui Xiao
Xiaoshan Yang
Fang Peng
Yaowei Wang
Changsheng Xu
ObjD
434
22
0
10 Oct 2024
TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation Models
Asian Conference on Computer Vision (ACCV), 2024
Rabin Adhikari
Safal Thapaliya
Manish Dhakal
Bishesh Khanal
MLLM
VLM
282
2
0
07 Oct 2024
Segment as You Wish -- Free-Form Language-Based Segmentation for Medical Images
Longchao Da
Rui Wang
Xiaojian Xu
Parminder Bhatia
Taha A. Kass-Hout
Hua Wei
Cao Xiao
MedIm
VLM
289
2
0
02 Oct 2024
One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
Neural Information Processing Systems (NeurIPS), 2024
Zechen Bai
Tong He
Haiyang Mei
Pichao Wang
Ziteng Gao
Joya Chen
Lei Liu
Zheng Zhang
Mike Zheng Shou
VLM
VOS
MLLM
251
74
0
29 Sep 2024
Fully Aligned Network for Referring Image Segmentation
Visual Communications and Image Processing (VCIP), 2024
Yong-Jin Liu
Ruihao Xu
Yansong Tang
242
0
0
29 Sep 2024
A Parameter-Efficient Tuning Framework for Language-guided Object Grounding and Robot Grasping
IEEE International Conference on Robotics and Automation (ICRA), 2024
Houjian Yu
Mingen Li
Alireza Rezazadeh
Yang Yang
Changhyun Choi
541
6
0
28 Sep 2024
PTQ4RIS: Post-Training Quantization for Referring Image Segmentation
IEEE International Conference on Robotics and Automation (ICRA), 2024
Xiaoyan Jiang
Hang Yang
Kaiying Zhu
Xihe Qiu
Shibo Zhao
Sifan Zhou
MQ
156
2
0
25 Sep 2024
DIAL: Dense Image-text ALignment for Weakly Supervised Semantic Segmentation
European Conference on Computer Vision (ECCV), 2024
Soojin Jang
Jungmin Yun
Junehyoung Kwon
Eunju Lee
Youngbin Kim
325
8
0
24 Sep 2024
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
International Conference on Learning Representations (ICLR), 2024
Weifeng Lin
Xinyu Wei
Renrui Zhang
Le Zhuo
Shitian Zhao
...
Junlin Xie
Junlin Xie
Yu Qiao
Peng Gao
Hongsheng Li
MLLM
DiffM
565
25
0
23 Sep 2024
Instruction-guided Multi-Granularity Segmentation and Captioning with Large Multimodal Model
Li Zhou
Xu Yuan
Zenghui Sun
Zikun Zhou
Jingsong Lan
VLM
MLLM
861
7
0
20 Sep 2024
DetailCLIP: Detail-Oriented CLIP for Fine-Grained Tasks
Amin Karimi Monsefi
Kishore Prakash Sailaja
Ali Alilooee
Ser-Nam Lim
R. Ramnath
VLM
376
16
0
10 Sep 2024
SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation
European Conference on Computer Vision (ECCV), 2024
Yi-Chia Chen
Wei-Hua Li
Cheng Sun
Yu-Chiang Frank Wang
Chu-Song Chen
VLM
237
56
0
01 Sep 2024
Depth-Weighted Detection of Behaviours of Risk in People with Dementia using Cameras
Pratik K. Mishra
Irene Ballester
Andrea Iaboni
Bing Ye
Kristine Newman
Alex Mihailidis
Shehroz S. Khan
245
2
0
28 Aug 2024
Perceive-IR: Learning to Perceive Degradation Better for All-in-One Image Restoration
IEEE Transactions on Image Processing (TIP), 2024
Xu Zhang
Jiaqi Ma
Guoli Wang
Qian Zhang
Huan Zhang
Lefei Zhang
VLM
552
37
0
28 Aug 2024
Image Segmentation in Foundation Model Era: A Survey
Tianfei Zhou
Fei Zhang
Boyu Chang
Wenguan Wang
Ye Yuan
E. Konukoglu
Daniel Cremers
VLM
427
29
0
23 Aug 2024
Cross-aware Early Fusion with Stage-divided Vision and Language Transformer Encoders for Referring Image Segmentation
IEEE transactions on multimedia (IEEE TMM), 2024
Yubin Cho
Hyunwoo Yu
Suk-Ju Kang
282
34
0
14 Aug 2024
In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation
European Conference on Computer Vision (ECCV), 2024
Dahyun Kang
Minsu Cho
ObjD
VLM
385
24
0
09 Aug 2024
ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling
European Conference on Computer Vision (ECCV), 2024
William Y. Zhu
Keren Ye
Junjie Ke
Jiahui Yu
Leonidas Guibas
P. Milanfar
Feng Yang
341
2
0
07 Aug 2024
UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Zhaowei Li
Wei Wang
Yiqing Cai
Xu Qi
Pengyu Wang
Dong Zhang
Hang Song
Botian Jiang
Zhida Huang
Tao Wang
AIFin
LRM
216
9
0
05 Aug 2024
An Efficient and Effective Transformer Decoder-Based Framework for Multi-Task Visual Grounding
European Conference on Computer Vision (ECCV), 2024
Wei Chen
Mahdieh Hatamian
Yu Wu
238
16
0
02 Aug 2024
Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey
Atsuyuki Miyai
Jingkang Yang
Jingyang Zhang
Yifei Ming
Sisir Dhakal
...
Yixuan Li
Hai "Helen" Li
Ziwei Liu
Toshihiko Yamasaki
Kiyoharu Aizawa
367
29
0
31 Jul 2024
Diffusion Feedback Helps CLIP See Better
International Conference on Learning Representations (ICLR), 2024
Wenxuan Wang
Quan-Sen Sun
Fan Zhang
Yepeng Tang
Jing Liu
Xinlong Wang
VLM
331
39
0
29 Jul 2024
RefMask3D: Language-Guided Transformer for 3D Referring Segmentation
Shuting He
Henghui Ding
256
21
0
25 Jul 2024
VISA: Reasoning Video Object Segmentation via Large Language Models
Cilin Yan
Haochen Wang
Shilin Yan
Xiaolong Jiang
Yao Hu
Guoliang Kang
Weidi Xie
E. Gavves
LRM
VLM
VOS
237
92
0
16 Jul 2024
Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation
Seonghoon Yu
Paul Hongsuck Seo
Jeany Son
DiffM
413
12
0
10 Jul 2024
Previous
1
2
3
4
5
6
Next