Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2312.00869
Cited By
v1
v2 (latest)
Segment and Caption Anything
Computer Vision and Pattern Recognition (CVPR), 2023
1 December 2023
Xiaoke Huang
Jianfeng Wang
Yansong Tang
Zheng Zhang
Han Hu
Jiwen Lu
Lijuan Wang
Zicheng Liu
MLLM
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (21 upvotes)
Papers citing
"Segment and Caption Anything"
11 / 11 papers shown
Title
PixelRefer: A Unified Framework for Spatio-Temporal Object Referring with Arbitrary Granularity
Yuqian Yuan
W. Zhang
Xin Li
Shihao Wang
Kehan Li
Wentong Li
Jun Xiao
Lei Zhang
Beng Chin Ooi
ObjD
322
0
0
27 Oct 2025
SAMSOD: Rethinking SAM Optimization for RGB-T Salient Object Detection
Zhengyi Liu
Xinrui Wang
Xianyong Fang
Zhengzheng Tu
Linbo Wang
98
0
0
04 Oct 2025
AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning
Yiming Ren
Zhiqiang Lin
Yu Li
Gao Meng
Weiyun Wang
...
Zicheng Lin
Jifeng Dai
Yujiu Yang
Wenhai Wang
Ruihang Chu
156
3
0
17 Jul 2025
Stepping Out of Similar Semantic Space for Open-Vocabulary Segmentation
Yong-Jin Liu
SongLi Wu
Sule Bai
Jiahao Wang
Yitong Wang
Yansong Tang
VLM
VOS
266
2
0
19 Jun 2025
Embodied Image Captioning: Self-supervised Learning Agents for Spatially Coherent Image Descriptions
Tommaso Galliena
Tommaso Apicella
Stefano Rosa
Pietro Morerio
Alessio Del Bue
Lorenzo Natale
317
0
0
11 Apr 2025
Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting
Yunlong Tang
Jing Bi
Chao Huang
Susan Liang
Daiki Shimada
...
Jinxi He
Liu He
Zeliang Zhang
Jiebo Luo
Chenliang Xu
226
8
0
07 Apr 2025
Frontiers in Intelligent Colonoscopy
Ge-Peng Ji
Jingyi Liu
Peng Xu
Nick Barnes
Fahad Shahbaz Khan
Salman Khan
Deng-Ping Fan
332
11
0
22 Oct 2024
An Augmentation-based Model Re-adaptation Framework for Robust Image Segmentation
Zheming Zuo
Joseph Smith
Jonathan Stonehouse
Boguslaw Obara
VLM
285
0
0
14 Sep 2024
VoCo-LLaMA: Towards Vision Compression with Large Language Models
Xubing Ye
Yukang Gan
Xiaoke Huang
Yixiao Ge
Yansong Tang
MLLM
VLM
311
49
0
18 Jun 2024
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Weifeng Lin
Xinyu Wei
Ruichuan An
Shiyang Feng
Bocheng Zou
Yulin Luo
Siyuan Huang
Shanghang Zhang
Jiaming Song
VLM
343
84
0
29 Mar 2024
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Shilong Zhang
Pei Sun
Shoufa Chen
Min Xiao
Wenqi Shao
Wenwei Zhang
Yu Liu
Kai-xiang Chen
Ping Luo
MLLM
VLM
805
312
0
07 Jul 2023
1