Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.02643
Cited By
Segment Anything
5 April 2023
A. Kirillov
Eric Mintun
Nikhila Ravi
Hanzi Mao
Chloe Rolland
Laura Gustafson
Tete Xiao
Spencer Whitehead
Alexander C. Berg
Wan-Yen Lo
Piotr Dollár
Ross B. Girshick
MLLM
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Segment Anything"
50 / 4,189 papers shown
Title
SPOC: Spatially-Progressing Object State Change Segmentation in Video
Priyanka Mandikal
Tushar Nagarajan
Alex Stoken
Zihui Xue
Kristen Grauman
46
0
0
15 Mar 2025
TACO: Taming Diffusion for in-the-wild Video Amodal Completion
Ruijie Lu
Yixin Chen
Yu Liu
Jiaxiang Tang
Junfeng Ni
Diwen Wan
Gang Zeng
Siyuan Huang
DiffM
VGen
51
3
0
15 Mar 2025
E-SAM: Training-Free Segment Every Entity Model
Weiming Zhang
Dingwen Xiao
Lei Chen
Lin Wang
VLM
57
0
0
15 Mar 2025
EmoAgent: Multi-Agent Collaboration of Plan, Edit, and Critic, for Affective Image Manipulation
Qi Mao
Haobo Hu
Yujie He
Difei Gao
Haokun Chen
Libiao Jin
DiffM
50
0
0
14 Mar 2025
EgoSplat: Open-Vocabulary Egocentric Scene Understanding with Language Embedded 3D Gaussian Splatting
Di Li
Jie Feng
Jiahao Chen
Weisheng Dong
Guanbin Li
G. Shi
Licheng Jiao
3DGS
VLM
201
0
0
14 Mar 2025
SpaceSeg: A High-Precision Intelligent Perception Segmentation Method for Multi-Spacecraft On-Orbit Targets
Hao Liu
Pengyu Guo
Siyuan Yang
Zeqing Jiang
Qinglei Hu
Dongyu Li
48
0
0
14 Mar 2025
COIN: Confidence Score-Guided Distillation for Annotation-Free Cell Segmentation
Sanghyun Jo
Seo Jin Lee
Seungwoo Lee
Seohyung Hong
Hyungseok Seo
Kyungsu Kim
48
0
0
14 Mar 2025
Open3DVQA: A Benchmark for Comprehensive Spatial Reasoning with Multimodal Large Language Model in Open Space
Weichen Zhan
Zile Zhou
Zhiheng Zheng
Chen Gao
Jinqiang Cui
Yong Li
Xinlei Chen
Xiao-Ping Zhang
LRM
63
1
0
14 Mar 2025
Bring Your Rear Cameras for Egocentric 3D Human Pose Estimation
Hiroyasu Akada
Jian Wang
Vladislav Golyanik
Christian Theobalt
EgoV
81
0
0
14 Mar 2025
SDF-TopoNet: A Two-Stage Framework for Tubular Structure Segmentation via SDF Pre-training and Topology-Aware Fine-Tuning
Siyi Wu
Liang Zhao
Haotian Ma
Xinyuan Song
49
0
0
14 Mar 2025
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion
A. Nassar
Andres Marafioti
Matteo Omenetti
Maksym Lysak
Nikolaos Livathinos
...
Yusik Kim
A. Said Gurbuz
Michele Dolfi
Miquel Farré
Peter W. J. Staar
61
3
0
14 Mar 2025
Human-in-the-Loop Local Corrections of 3D Scene Layouts via Infilling
Christopher Xie
A. Avetisyan
Henry Howard-Jenkins
Yawar Siddiqui
Julian Straub
Richard Newcombe
Vasileios Balntas
Jakob Julian Engel
3DH
3DV
70
0
0
14 Mar 2025
Quantifying Interpretability in CLIP Models with Concept Consistency
Avinash Madasu
Vasudev Lal
Phillip Howard
VLM
69
0
0
14 Mar 2025
Piece it Together: Part-Based Concepting with IP-Priors
Elad Richardson
Kfir Goldberg
Yuval Alaluf
Daniel Cohen-Or
DiffM
66
0
0
13 Mar 2025
GroomLight: Hybrid Inverse Rendering for Relightable Human Hair Appearance Modeling
Yang Zheng
Menglei Chai
Delio Vicini
Yuxiao Zhou
Yinghao Xu
Leonidas J. Guibas
Gordon Wetzstein
Thabo Beeler
3DH
56
0
0
13 Mar 2025
Eye on the Target: Eye Tracking Meets Rodent Tracking
Emil Mededovic
Yuli Wu
Henning Konermann
Marcin Kopaczka
Mareike Schulz
René Tolba
Johannes Stegmaier
66
0
0
13 Mar 2025
Large-scale Pre-training for Grounded Video Caption Generation
Evangelos Kazakos
Cordelia Schmid
Josef Sivic
59
0
0
13 Mar 2025
Long-Video Audio Synthesis with Multi-Agent Collaboration
Yehang Zhang
Xinli Xu
Xiaojie Xu
L. Liu
Yuxiao Chen
DiffM
VGen
53
0
0
13 Mar 2025
OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting
Yongsheng Yu
Ziyun Zeng
Haitian Zheng
Jiebo Luo
DiffM
64
0
0
13 Mar 2025
The Power of One: A Single Example is All it Takes for Segmentation in VLMs
Mir Rayat Imtiaz Hossain
Mennatullah Siam
Leonid Sigal
James J. Little
MLLM
VLM
Presented at
ResearchTrend Connect | VLM
on
21 May 2025
90
0
0
13 Mar 2025
GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding
R. Hu
Lianghui Zhu
Yuxuan Zhang
Tianheng Cheng
Lei Liu
Heng Liu
Longjin Ran
Xiaoxin Chen
Wenyu Liu
Xinggang Wang
ObjD
61
0
0
13 Mar 2025
OmniSTVG: Toward Spatio-Temporal Omni-Object Video Grounding
Jiali Yao
Xinran Deng
Xin Gu
Mengrui Dai
Bing Fan
Zhipeng Zhang
Yan Huang
Heng Fan
L. Zhang
61
0
0
13 Mar 2025
IMPACT: Intelligent Motion Planning with Acceptable Contact Trajectories via Vision-Language Models
Yiyang Ling
Karan Owalekar
Oluwatobiloba Adesanya
Erdem Bıyık
Daniel Seita
52
1
0
13 Mar 2025
6D Object Pose Tracking in Internet Videos for Robotic Manipulation
Georgy Ponimatkin
Martin Cífka
Tomáš Souček
Médéric Fourmy
Yann Labbé
Vladimir Petrik
Josef Sivic
52
1
0
13 Mar 2025
Interpretable Image Classification via Non-parametric Part Prototype Learning
Zhijie Zhu
Lei Fan
Maurice Pagnucco
Yang Song
55
0
0
13 Mar 2025
Spiritus: An AI-Assisted Tool for Creating 2D Characters and Animations
Qirui Sun
Yunyi Ni
Teli Yuan
Jiawei Zhang
Fan Yang
Zhihao Yao
Haipeng Mi
DiffM
50
0
0
13 Mar 2025
Investigating and Improving Counter-Stereotypical Action Relation in Text-to-Image Diffusion Models
Sina Malakouti
Adriana Kovashka
EGVM
72
0
0
13 Mar 2025
Bayesian Prompt Flow Learning for Zero-Shot Anomaly Detection
Zhen Qu
Xian Tao
Xinyi Gong
Shichen Qu
Qiyu Chen
Zhengtao Zhang
Xingang Wang
Guiguang Ding
VLM
64
0
0
13 Mar 2025
Do computer vision foundation models learn the low-level characteristics of the human visual system?
Yancheng Cai
Fei Yin
Dounia Hammou
Rafal Mantiuk
VLM
Presented at
ResearchTrend Connect | VLM
on
14 Mar 2025
147
1
0
13 Mar 2025
CoSTA
∗
\ast
∗
: Cost-Sensitive Toolpath Agent for Multi-turn Image Editing
Advait Gupta
NandaKiran Velaga
Dang Nguyen
Dinesh Manocha
DiffM
68
0
0
13 Mar 2025
OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with Transformer
Jinyang Li
En Yu
Sijia Chen
Wenbing Tao
75
1
0
13 Mar 2025
KUDA: Keypoints to Unify Dynamics Learning and Visual Prompting for Open-Vocabulary Robotic Manipulation
Zixian Liu
Mingtong Zhang
Yunzhu Li
54
0
0
13 Mar 2025
4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models
Wanhua Li
Renping Zhou
Jiawei Zhou
Yingwei Song
Johannes Herter
Minghan Qin
Gao Huang
Hanspeter Pfister
3DGS
VLM
66
0
0
13 Mar 2025
PlanGen: Towards Unified Layout Planning and Image Generation in Auto-Regressive Vision Language Models
Runze He
Bo Cheng
Yuhang Ma
Qingxiang Jia
Shanyuan Liu
Ao Ma
Xiaoyu Wu
Liebucha Wu
Dawei Leng
Yuhui Yin
DiffM
VLM
54
0
0
13 Mar 2025
Unveiling the Invisible: Reasoning Complex Occlusions Amodally with AURA
Zhixuan Li
Hyunse Yoon
Sanghoon Lee
Weisi Lin
57
0
0
13 Mar 2025
Zero-Shot Subject-Centric Generation for Creative Application Using Entropy Fusion
Kaifeng Zou
Xiaoyi Feng
Peng Wang
Tao Huang
Zizhou Huang
Zhang Haihang
Yuntao Zou
Dagang Li
DiffM
51
0
0
12 Mar 2025
Evaluation of state-of-the-art deep learning models in the segmentation of the heart ventricles in parasternal short-axis echocardiograms
Julian Rene Cuellar Buritica
Vu Dinh
Manjula Burri
Julie Roelandts
James Wendling
Jon D. Klingensmith
76
0
0
12 Mar 2025
NVP-HRI: Zero Shot Natural Voice and Posture-based Human-Robot Interaction via Large Language Model
Yuzhi Lai
Shenghai Yuan
Youssef Nassar
Mingyu Fan
T. Weber
Matthias Rätsch
LM&Ro
64
3
0
12 Mar 2025
Unified Dense Prediction of Video Diffusion
Lehan Yang
Lu Qi
Xianrui Li
Sheng Li
Varun Jampani
Ming Yang
MDE
VOS
VGen
63
0
0
12 Mar 2025
InteractEdit: Zero-Shot Editing of Human-Object Interactions in Images
Jiun Tian Hoe
Weipeng Hu
Wei Zhou
Chao Xie
Ziwei Wang
Chee Seng Chan
Xudong Jiang
Y. Tan
61
0
0
12 Mar 2025
Online Language Splatting
Saimouli Katragadda
Cho-Ying Wu
Yuliang Guo
Xinyu Huang
Guoquan Huang
Liu Ren
3DGS
OffRL
65
0
0
12 Mar 2025
Leveraging Semantic Attribute Binding for Free-Lunch Color Control in Diffusion Models
Héctor Laria
Alexandra Gomez-Villa
Jiang Qin
Muhammad Atif Butt
Bogdan Raducanu
Javier Vázquez-Corral
Joost van de Weijer
Kai Wang
DiffM
65
0
0
12 Mar 2025
Motion Blender Gaussian Splatting for Dynamic Scene Reconstruction
Xinyu Zhang
Haonan Chang
Yuhan Liu
Abdeslam Boularias
3DGS
39
0
0
12 Mar 2025
Training Data Provenance Verification: Did Your Model Use Synthetic Data from My Generative Model for Training?
Yuechen Xie
Jie Song
Huiqiong Wang
Mingli Song
55
0
0
12 Mar 2025
Polygonizing Roof Segments from High-Resolution Aerial Images Using Yolov8-Based Edge Detection
Qipeng Mei
Dimitri Bulatov
Dorota Iwaszczuk
59
0
0
12 Mar 2025
CleverDistiller: Simple and Spatially Consistent Cross-modal Distillation
Hariprasath Govindarajan
Maciej K. Wozniak
Marvin Klingner
Camille Maurice
B. R. Kiran
S. Yogamani
55
0
0
12 Mar 2025
ForAug: Recombining Foregrounds and Backgrounds to Improve Vision Transformer Training with Bias Mitigation
Tobias Christian Nauen
Brian B. Moser
Federico Raue
Stanislav Frolov
Andreas Dengel
ViT
60
0
0
12 Mar 2025
GarmentPile: Point-Level Visual Affordance Guided Retrieval and Adaptation for Cluttered Garments Manipulation
Ruihai Wu
Ziyu Zhu
Yuran Wang
Yue Chen
Jiarui Wang
Hao Dong
63
0
0
12 Mar 2025
PerCoV2: Improved Ultra-Low Bit-Rate Perceptual Image Compression with Implicit Hierarchical Masked Image Modeling
Nikolai Korber
Eduard Kromer
Andreas Siebert
S. Hauke
Daniel Mueller-Gritschneder
Björn Schuller
56
0
0
12 Mar 2025
Efficient Alignment of Unconditioned Action Prior for Language-conditioned Pick and Place in Clutter
Kechun Xu
Xunlong Xia
Kaixuan Wang
Yifei Yang
Yunxuan Mao
Bing Deng
R. Xiong
Yansen Wang
OffRL
72
0
0
12 Mar 2025
Previous
1
2
3
...
10
11
12
...
82
83
84
Next