Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2408.00714
Cited By
SAM 2: Segment Anything in Images and Videos
International Conference on Learning Representations (ICLR), 2024
1 August 2024
Nikhila Ravi
Valentin Gabeur
Yuan-Ting Hu
Ronghang Hu
Chaitanya K. Ryali
Tengyu Ma
Haitham Khedr
Roman Rädle
Chloe Rolland
Laura Gustafson
Eric Mintun
Junting Pan
Kalyan Vasudev Alwala
Nicolas Carion
Chao-Yuan Wu
Ross B. Girshick
Piotr Dollár
Christoph Feichtenhofer
VLM
MLLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (116 upvotes)
Papers citing
"SAM 2: Segment Anything in Images and Videos"
50 / 860 papers shown
Object-Centric 3D Gaussian Splatting for Strawberry Plant Reconstruction and Phenotyping
Jiajia Li
Keyi Zhu
Qianwen Zhang
Dong Chen
Qi Sun
Zhaojian Li
84
0
0
04 Nov 2025
Pinpointing Trigger Moment for Grounded Video QA: Enhancing Spatio-temporal Grounding in Multimodal Large Language Models
Jinhwan Seo
Y. Cho
Junhyug Noh
Sung-eui Yoon
49
0
0
04 Nov 2025
UniChange: Unifying Change Detection with Multimodal Large Language Model
Xu-Yao Zhang
Danyang Li
Xiaohang Dong
Tianhao Wu
Hualong Yu
Jianye Wang
Qicheng Li
Xiang Li
MLLM
348
0
0
04 Nov 2025
PixelVLA: Advancing Pixel-level Understanding in Vision-Language-Action Model
Wenqi Liang
Gan Sun
Yao He
Jiahua Dong
Suyan Dai
Ivan Laptev
Salman Khan
Yang Cong
LM&Ro
3DV
VLM
202
2
0
03 Nov 2025
RefVTON: person-to-person Try on with Additional Unpaired Visual Reference
Liuzhuozheng Li
Yue Gong
Shanyuan Liu
Bo Cheng
Yuhang Ma
Liebucha Wu
Dengyang Jiang
Zanyi Wang
Dawei Leng
Yuhui Yin
351
0
0
02 Nov 2025
Understanding the Implicit User Intention via Reasoning with Large Language Model for Image Editing
Yijia Wang
Yiqing Shen
Weiming Chen
Z. He
DiffM
145
0
0
31 Oct 2025
AD-SAM: Fine-Tuning the Segment Anything Vision Foundation Model for Autonomous Driving Perception
Mario Camarena
Het Patel
Fatemeh Nazari
Evangelos E. Papalexakis
Mohamadhossein Noruzoliaee
Jia Chen
VLM
169
0
0
30 Oct 2025
LangHOPS: Language Grounded Hierarchical Open-Vocabulary Part Segmentation
Yang Miao
Jan-Nico Zaech
Xi Wang
Fabien Despinoy
Danda Pani Paudel
Luc Van Gool
VLM
330
0
0
29 Oct 2025
Octopus-like Reaching Motion: A Perspective Inspired by Whipping
Shengyao Zhang
Yiyuan Zhang
C. Zhang
Yiming Li
Wenci Xin
Yuliang Liufu
Hong Wei Ng
Cecilia Laschi
93
0
0
29 Oct 2025
Generative AI for Healthcare: Fundamentals, Challenges, and Perspectives
Gang Chen
Changshuo Liu
Gene Anne Ooi
Marcus Tan
Zhongle Xie
Jianwei Yin
J. Yip
Wenqiao Zhang
Jiaqi Zhu
Beng Chin Ooi
LM&MA
289
0
0
28 Oct 2025
Vanish into Thin Air: Cross-prompt Universal Adversarial Attacks for SAM2
Ziqi Zhou
Yifan Hu
Yufei Song
Zijing Li
Shengshan Hu
Leo Yu Zhang
Dezhong Yao
Long Zheng
Hai Jin
AAML
186
6
0
28 Oct 2025
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation
Inclusion AI
Bowen Ma
Cheng Zou
C. Yan
Chunxiang Jin
...
Zhiqiang Fang
Zhihao Qiu
Ziyuan Huang
Zizheng Yang
Z. He
MLLM
MoE
VLM
350
2
0
28 Oct 2025
World Simulation with Video Foundation Models for Physical AI
Nvidia
A. M. Ali
Junjie Bai
Maciej Bala
Yogesh Balaji
...
Jing Zhang
Qinsheng Zhang
Kaiwen Zheng
Andrew Zhu
Yuke Zhu
VGen
PINN
462
21
0
28 Oct 2025
Localising under the drape: proprioception in the era of distributed surgical robotic system
M. Huber
N. Cavalcanti
Ayoob Davoodi
Ruixuan Li
Christopher E. Mower
...
Emmanuel Vander Poorten
Philipp Fürnstahl
Sebastien Ourselin
Christos Bergeles
Tom Vercauteren
125
0
0
27 Oct 2025
PixelRefer: A Unified Framework for Spatio-Temporal Object Referring with Arbitrary Granularity
Yuqian Yuan
W. Zhang
Xin Li
Shihao Wang
Kehan Li
Wentong Li
Jun Xiao
Lei Zhang
Beng Chin Ooi
ObjD
366
1
0
27 Oct 2025
Track, Inpaint, Resplat: Subject-driven 3D and 4D Generation with Progressive Texture Infilling
Shuhong Zheng
Ashkan Mirzaei
Igor Gilitschenski
DiffM
VGen
195
0
0
27 Oct 2025
Look and Tell: A Dataset for Multimodal Grounding Across Egocentric and Exocentric Views
Anna Deichler
Jonas Beskow
VGen
144
0
0
26 Oct 2025
Generalizable Hierarchical Skill Learning via Object-Centric Representation
Haibo Zhao
Yu Qi
Boce Hu
Yizhe Zhu
Ziyan Chen
...
Owen Howell
Haojie Huang
Robin Walters
Dian Wang
Robert Platt
147
0
0
24 Oct 2025
FineRS: Fine-grained Reasoning and Segmentation of Small Objects with Reinforcement Learning
Lu Zhang
Jiazuo Yu
Haomiao Xiong
Ping Hu
Yunzhi Zhuge
Huchuan Lu
You He
LRM
144
0
0
24 Oct 2025
S3OD: Towards Generalizable Salient Object Detection with Synthetic Data
Orest Kupyn
Hirokatsu Kataoka
Christian Rupprecht
126
1
0
24 Oct 2025
Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos
Qixiu Li
Yu Deng
Yaobo Liang
L. Luo
Lei Zhou
...
Hao Chen
Lily Sun
Dong Chen
J. Yang
B. Guo
129
8
0
24 Oct 2025
MUVR: A Multi-Modal Untrimmed Video Retrieval Benchmark with Multi-Level Visual Correspondence
Yue Feng
Jinwei Hu
Qijia Lu
Jiawei Niu
Li Tan
...
Shiping Ge
Ethan Q. Chen
Wentong Li
Limin Wang
Jie Qin
128
0
0
24 Oct 2025
Thermal Polarimetric Multi-view Stereo
Takahiro Kushida
Kenichiro Tanaka
3DV
141
0
0
23 Oct 2025
COS3D: Collaborative Open-Vocabulary 3D Segmentation
Runsong Zhu
Ka-Hei Hui
Zhengzhe Liu
Qianyi Wu
Weiliang Tang
Shi Qiu
Pheng-Ann Heng
Chi-Wing Fu
3DGS
161
1
0
23 Oct 2025
HRT1: One-Shot Human-to-Robot Trajectory Transfer for Mobile Manipulation
Sai Haneesh Allu
Jishnu Jaykumar P
Ninad Khargonkar
Tyler Summers
Jian Yao
Yu Xiang
112
0
0
23 Oct 2025
PPMStereo: Pick-and-Play Memory Construction for Consistent Dynamic Stereo Matching
Yun Wang
Junjie Hu
Qiaole Dong
Y. Zhang
Yanwei Fu
Tin Lun Lam
Dapeng Wu
111
0
0
23 Oct 2025
GranViT: A Fine-Grained Vision Model With Autoregressive Perception For MLLMs
Guanghao Zheng
Bowen Shi
Mingxing Xu
Ruoyu Sun
Peisen Zhao
...
Wenrui Dai
Junni Zou
Hongkai Xiong
Xiaopeng Zhang
Qi Tian
VLM
161
0
0
23 Oct 2025
GigaBrain-0: A World Model-Powered Vision-Language-Action Model
GigaBrain Team
Angen Ye
Boyuan Wang
Chaojun Ni
Guan Huang
...
Yukun Zhou
Z. Dong
Z. J. Wang
Zhichao Liu
Zheng Hua Zhu
LM&Ro
VLM
447
7
0
22 Oct 2025
Decomposed Attention Fusion in MLLMs for Training-Free Video Reasoning Segmentation
Su Ho Han
Jeongseok Hyun
Pilhyeon Lee
Minho Shim
Dongyoon Wee
Seon Joo Kim
VOS
VLM
241
0
0
22 Oct 2025
Hierarchical DLO Routing with Reinforcement Learning and In-Context Vision-language Models
Mingen Li
Houjian Yu
Yixuan Huang
Youngjin Hong
Changhyun Choi
129
0
0
22 Oct 2025
SAM 2++: Tracking Anything at Any Granularity
J. Zhang
C. Liang
Yichun Yang
Chenkai Zeng
Yutao Cui
Xinwen Zhang
Xin Zhou
Kai Ma
Gangshan Wu
Limin Wang
218
0
0
21 Oct 2025
Automated urban waterlogging assessment and early warning through a mixture of foundation models
Chenxu Zhang
Fuxiang Huang
Lei Zhang
152
0
0
21 Oct 2025
EMA-SAM: Exponential Moving-average for SAM-based PTMC Segmentation
Maryam Dialameh
Hossein Rajabzadeh
Jung Suk Sim
Hyock Ju Kwon
149
0
0
21 Oct 2025
CaMiT: A Time-Aware Car Model Dataset for Classification and Generation
Frédéric LIN
Biruk Abere Ambaw
Adrian Daniel Popescu
Hejer Ammar
Romaric Audigier
Hervé Le Borgne
VLM
AI4TS
284
0
0
20 Oct 2025
Botany-Bot: Digital Twin Monitoring of Occluded and Underleaf Plant Structures with Gaussian Splats
Simeon Adebola
Chung Min Kim
Justin Kerr
Shuangyu Xie
Prithvi Akella
Jose Luis Susa Rincon
Eugen Solowjow
Ken Goldberg
193
0
0
20 Oct 2025
Expose Camouflage in the Water: Underwater Camouflaged Instance Segmentation and Dataset
C. Wang
Hua Li
Chongyi Li
Huazhong Liu
Xiongxin Tang
Sam Kwong
104
0
0
20 Oct 2025
World-in-World: World Models in a Closed-Loop World
Jiahan Zhang
Muqing Jiang
Nanru Dai
Taiming Lu
Arda Uzunoglu
...
Rama Chellappa
Tianmin Shu
Alan Yuille
Yilun Du
Jieneng Chen
VGen
VLM
234
6
0
20 Oct 2025
Segmentation as A Plug-and-Play Capability for Frozen Multimodal LLMs
Jiazhen Liu
Long Chen
MLLM
VLM
163
2
0
19 Oct 2025
Safe Payload Transfer with Ship-Mounted Cranes: A Robust Model Predictive Control Approach
Ersin Daş
William A. Welch
Patrick Spieler
Keenan Albee
Aurelio Noca
...
Anna Sabel
Grace Lim
Rohan Thakker
Amir Rahmani
J. W. Burdick
66
0
0
19 Oct 2025
Pursuing Minimal Sufficiency in Spatial Reasoning
Yejie Guo
Yunzhong Hou
Wufei Ma
Meng Tang
Ming-Hsuan Yang
LRM
100
0
0
19 Oct 2025
How Universal Are SAM2 Features?
Masoud Khairi Atani
Alon Harell
Hyomin Choi
Runyu Yang
Fabien Racapé
Ivan V. Bajić
VLM
132
0
0
19 Oct 2025
Cataract-LMM: Large-Scale, Multi-Source, Multi-Task Benchmark for Deep Learning in Surgical Video Analysis
Mohammad Javad Ahmadi
Iman Gandomi
Parisa Abdi
Seyed-Farzad Mohammadi
Amirhossein Taslimi
Mehdi Khodaparast
Hassan Hashemi
Mahdi Tavakoli
H. Taghirad
109
0
0
18 Oct 2025
Promptable Fire Segmentation: Unleashing SAM2's Potential for Real-Time Mobile Deployment with Strategic Bounding Box Guidance
Emmanuel U. Ugwu
Zhang Xinming
VLM
113
0
0
18 Oct 2025
TokenAR: Multiple Subject Generation via Autoregressive Token-level enhancement
Haiyue Sun
Qingdong He
Jinlong Peng
Peng Tang
Jiangning Zhang
Junwei Zhu
Xiaobin Hu
Shuicheng Yan
DiffM
VGen
116
0
0
18 Oct 2025
Memory-SAM: Human-Prompt-Free Tongue Segmentation via Retrieval-to-Prompt
Joongwon Chae
Lihui Luo
Xi Yuan
Dongmei Yu
Zhenglin Chen
Lian Zhang
Peiwu Qin
VLM
129
0
0
17 Oct 2025
Uncertainty-Aware Extreme Point Tracing for Weakly Supervised Ultrasound Image Segmentation
Lei Shi
Gang Li
Junxing Zhang
105
0
0
17 Oct 2025
Proactive Scene Decomposition and Reconstruction
Baicheng Li
Zike Yan
Dong Wu
H. Zha
104
0
0
17 Oct 2025
VLA^2: Empowering Vision-Language-Action Models with an Agentic Framework for Unseen Concept Manipulation
Han Zhao
Jiaxuan Zhang
Wenxuan Song
Pengxiang Ding
Donglin Wang
98
2
0
16 Oct 2025
3D Scene Prompting for Scene-Consistent Camera-Controllable Video Generation
J. Lee
Jaewoo Jung
Jisang Han
Takuya Narihira
Kazumi Fukuda
Junyoung Seo
Sunghwan Hong
Yuki Mitsufuji
Seungryong Kim
VGen
126
1
0
16 Oct 2025
Generalist vs Specialist Time Series Foundation Models: Investigating Potential Emergent Behaviors in Assessing Human Health Using PPG Signals
Saurabh Kataria
Yi Wu
Zhaoliang Chen
Hyunjung Gloria Kwak
Yuhao Xu
...
C. Jabaley
Tim Buchman
Sivasubramanium V Bhavani
Randall J Lee
Xiao Hu
AI4TS
AI4MH
LM&MA
200
0
0
16 Oct 2025
Previous
1
2
3
4
5
6
...
16
17
18
Next