Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2408.00714
Cited By
SAM 2: Segment Anything in Images and Videos
International Conference on Learning Representations (ICLR), 2024
1 August 2024
Nikhila Ravi
Valentin Gabeur
Yuan-Ting Hu
Ronghang Hu
Chaitanya K. Ryali
Tengyu Ma
Haitham Khedr
Roman Rädle
Chloe Rolland
Laura Gustafson
Eric Mintun
Junting Pan
Kalyan Vasudev Alwala
Nicolas Carion
Chao-Yuan Wu
Ross B. Girshick
Piotr Dollár
Christoph Feichtenhofer
VLM
MLLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (116 upvotes)
Papers citing
"SAM 2: Segment Anything in Images and Videos"
50 / 859 papers shown
A multi-modal tactile fingertip design for robotic hands to enhance dexterous manipulation
Zhuowei Xu
Zilin Si
Kevin Zhang
Oliver Kroemer
Zeynep Temel
132
0
0
06 Oct 2025
SPEGNet: Synergistic Perception-Guided Network for Camouflaged Object Detection
Baber Jan
Saeed Anwar
Aiman El-Maleh
Abdul Jabbar Siddiqui
Abdul Bais
143
0
0
06 Oct 2025
Bridge Thinking and Acting: Unleashing Physical Potential of VLM with Generalizable Action Expert
Mingyu Liu
Zheng Huang
Xiaoyi Lin
Huanyi Zheng
Canyu Zhao
Zongze Du
Y. Wang
Haoyi Zhu
Hao Chen
Chunhua Shen
141
0
0
04 Oct 2025
SAMSOD: Rethinking SAM Optimization for RGB-T Salient Object Detection
Zhengyi Liu
Xinrui Wang
Xianyong Fang
Zhengzheng Tu
Linbo Wang
122
1
0
04 Oct 2025
EmbodiSwap for Zero-Shot Robot Imitation Learning
Eadom Dessalene
P. Mantripragada
Michael Maynord
Yiannis Aloimonos
LM&Ro
112
1
0
04 Oct 2025
Geometry Meets Vision: Revisiting Pretrained Semantics in Distilled Fields
Zhiting Mei
Ola Shorinwa
Anirudha Majumdar
137
1
0
03 Oct 2025
Med-K2N: Flexible K-to-N Modality Translation for Medical Image Synthesis
Feng Yuan
Yifan Gao
Yuehua Ye
Haoyue Li
Xin Gao
MedIm
86
0
0
03 Oct 2025
Towards Scalable and Consistent 3D Editing
Ruihao Xia
Yang Tang
Pan Zhou
DiffM
144
2
0
03 Oct 2025
Dynamic Prompt Generation for Interactive 3D Medical Image Segmentation Training
Tidiane Camaret N'dir
Alexander Pfefferle
Robin Tibor Schirrmeister
MedIm
3DH
313
2
0
03 Oct 2025
Inferring Dynamic Physical Properties from Video Foundation Models
Guanqi Zhan
Xianzheng Ma
Weidi Xie
Andrew Zisserman
VGen
156
2
0
02 Oct 2025
When Tracking Fails: Analyzing Failure Modes of SAM2 for Point-Based Tracking in Surgical Videos
Woowon Jang
Jiwon Im
Juseung Choi
Niki Rashidian
W. D. Neve
Utku Ozbulak
119
0
0
02 Oct 2025
Holistic Order Prediction in Natural Scenes
Pierre Musacchio
Hyunmin Lee
Jaesik Park
3DV
259
0
0
02 Oct 2025
IMAGEdit: Let Any Subject Transform
Fei Shen
Weihao Xu
Rui Yan
Dong Zhang
Xiangbo Shu
Jinhui Tang
VGen
120
1
0
01 Oct 2025
Affordance-Guided Diffusion Prior for 3D Hand Reconstruction
Naru Suzuki
Takehiko Ohkawa
Tatsuro Banno
Jihyun Lee
Ryosuke Furuta
Yoichi Sato
DiffM
161
1
0
01 Oct 2025
Instant4D: 4D Gaussian Splatting in Minutes
Zhanpeng Luo
Haoxi Ran
Li Lu
3DGS
VGen
177
1
0
01 Oct 2025
Assessing Foundation Models for Mold Colony Detection with Limited Training Data
Henrik Pichler
Janis Keuper
Matthew Copping
87
0
0
01 Oct 2025
Robust Context-Aware Object Recognition
Klara Janouskova
Cristian Gavrus
Jirí Matas
195
0
0
01 Oct 2025
Domain-Specialized Interactive Segmentation Framework for Meningioma Radiotherapy Planning
J. Lee
Han Jang
Kyu Sung Choi
68
0
0
01 Oct 2025
Towards Unified Multimodal Misinformation Detection in Social Media: A Benchmark Dataset and Baseline
Haiyang Li
Yaxiong Wang
Lianwei Wu
Lechao Cheng
Lechao Cheng
Zhun Zhong
195
2
0
30 Sep 2025
The 1st Solution for MOSEv1 Challenge on LSVOS 2025: CGFSeg
Tingmin Li
Yixuan Li
Yang Yang
VOS
215
0
0
30 Sep 2025
A Systematic Study of Large Language Models for Task and Motion Planning With PDDLStream
Jorge Mendez-Mendez
LRM
110
1
0
30 Sep 2025
Cat: Post-Training Quantization Error Reduction via Cluster-based Affine Transformation
Ali Zoljodi
Radu Timofte
Masoud Daneshtalab
MQ
148
0
0
30 Sep 2025
NeoWorld: Neural Simulation of Explorable Virtual Worlds via Progressive 3D Unfolding
Yanpeng Zhao
Shanyan Guan
Y Samuel Wang
Yanhao Ge
Wei-Jang Li
Xiaokang Yang
VGen
131
0
0
29 Sep 2025
Triangle Splatting+: Differentiable Rendering with Opaque Triangles
Jan Held
Renaud Vandeghen
Sanghyun Son
Daniel Rebain
Matheus Gadelha
Yi Zhou
Ming-Chyuan Lin
Marc Van Droogenbroeck
Andrea Tagliasacchi
3DGS
118
2
0
29 Sep 2025
IA-VLA: Input Augmentation for Vision-Language-Action models in settings with semantically complex tasks
Eric Hannus
Miika Malin
Tran Minh Son Le
Ville Kyrki
VLM
96
1
0
29 Sep 2025
LayerD: Decomposing Raster Graphic Designs into Layers
Tomoyuki Suzuki
Kang-Jun Liu
Naoto Inoue
Kota Yamaguchi
158
3
0
29 Sep 2025
Efficient Domain-Adaptive Multi-Task Dense Prediction with Vision Foundation Models
Beomseok Kang
Niluthpol Chowdhury Mithun
Mikhail Sizintsev
Han-Pang Chiu
S. Samarasekera
104
0
0
28 Sep 2025
Open-Vocabulary Spatio-Temporal Scene Graph for Robot Perception and Teleoperation Planning
Y Samuel Wang
Zeyu Xue
Mujie Liu
Tongqin Zhang
Yan Hu
Zhou Zhao
Chenguang Yang
Zhenyu Lu
179
0
0
27 Sep 2025
RAU: Reference-based Anatomical Understanding with Vision Language Models
Yiwei Li
Y. Liu
Jiaqi Guo
Lin Zhao
Zheyuan Zhang
Xiao Chen
Boris Mailhe
Ankush Mukherjee
Terrence Chen
Shanhui Sun
148
2
0
26 Sep 2025
PartSAM: A Scalable Promptable Part Segmentation Model Trained on Native 3D Data
Zhe Zhu
Le Wan
Rui-Xue Xu
Y. Zhang
Honghua Chen
Zhiyang Dou
Cheng Lin
Yuan Liu
Mingqiang Wei
VLM
200
1
0
26 Sep 2025
CubistMerge: Spatial-Preserving Token Merging For Diverse ViT Backbones
Wenyi Gong
Mieszko Lis
155
0
0
26 Sep 2025
SingRef6D: Monocular Novel Object Pose Estimation with a Single RGB Reference
Jiahui Wang
H. Zhu
Haoren Guo
Abdullah Al Mamun
Cheng Xiang
T. Lee
132
0
0
26 Sep 2025
MultiCrafter: High-Fidelity Multi-Subject Generation via Disentangled Attention and Identity-Aware Preference Alignment
Tao Wu
Yibo Jiang
Yehao Lu
Zhizhong Wang
Longxiang Zhang
Zequn Qin
Xi Li
206
1
0
26 Sep 2025
VLBiMan: Vision-Language Anchored One-Shot Demonstration Enables Generalizable Bimanual Robotic Manipulation
Huayi Zhou
Kui Jia
LM&Ro
191
0
0
26 Sep 2025
Geo-R1: Improving Few-Shot Geospatial Referring Expression Understanding with Reinforcement Fine-Tuning
Zilun Zhang
Zian Guan
T. Zhao
H. Shen
Tianyu Li
Yuxiang Cai
Zhonggen Su
Zhaojun Liu
Jianwei Yin
Xiang Li
ObjD
LRM
242
3
0
26 Sep 2025
SAGE: Scene Graph-Aware Guidance and Execution for Long-Horizon Manipulation Tasks
Jialiang Li
Wenzheng Wu
Gaojing Zhang
Yifan Han
Wenzhao Lian
LM&Ro
132
0
0
26 Sep 2025
LG-CD: Enhancing Language-Guided Change Detection through SAM2 Adaptation
Yixiao Liu
Yizhou Yang
Jinwen Li
Jun Tao
R. Li
Xiangkun Wang
Min Zhu
Junlong Cheng
160
0
0
26 Sep 2025
RefAM: Attention Magnets for Zero-Shot Referral Segmentation
Anna Kukleva
Enis Simsar
A. Tonioni
Muhammad Ferjad Naeem
F. Tombari
J. E. Lenssen
Bernt Schiele
DiffM
VLM
641
0
0
26 Sep 2025
Drag4D: Align Your Motion with Text-Driven 3D Scene Generation
Minjun Kang
Inkyu Shin
Taeyeop Lee
In So Kweon
KuK-Jin Yoon
117
0
0
26 Sep 2025
NewtonGen: Physics-Consistent and Controllable Text-to-Video Generation via Neural Newtonian Dynamics
Yu Yuan
Xijun Wang
Tharindu Wickremasinghe
Zeeshan Nadir
Bole Ma
Stanley H. Chan
DiffM
VGen
PINN
1.5K
10
0
25 Sep 2025
Dense Semantic Matching with VGGT Prior
Songlin Yang
Tianyi Wei
Yushi Lan
Zeqi Xiao
Anyi Rao
Xingang Pan
3DV
192
0
0
25 Sep 2025
Joint Flow Trajectory Optimization For Feasible Robot Motion Generation from Video Demonstrations
Xiaoxiang Dong
Matthew Johnson-Roberson
Weiming Zhi
85
0
0
25 Sep 2025
UniTransfer: Video Concept Transfer via Progressive Spatial and Timestep Decomposition
Guojun Lei
Rong Zhang
Chi-Yin Wang
Tianhang Liu
Hong Li
Zhiyuan Ma
W. Xu
VGen
154
0
0
25 Sep 2025
Does FLUX Already Know How to Perform Physically Plausible Image Composition?
Shilin Lu
Zhuming Lian
Zihan Zhou
Shaocong Zhang
Chen Zhao
A. Kong
311
11
0
25 Sep 2025
Neptune-X: Active X-to-Maritime Generation for Universal Maritime Object Detection
Yu Guo
Shengfeng He
Yuxu Lu
Haonan An
Yihang Tao
Huilin Zhu
Jingxian Liu
Yuguang Fang
246
1
0
25 Sep 2025
Video models are zero-shot learners and reasoners
Thaddäus Wiedemer
Yuxuan Li
Paul Vicol
Shixiang Shane Gu
Nick Matarese
Kevin Swersky
Been Kim
P. Jaini
Robert Geirhos
VLM
LRM
248
56
0
24 Sep 2025
Attack for Defense: Adversarial Agents for Point Prompt Optimization Empowering Segment Anything Model
Xueyu Liu
Xiaoyi Zhang
Guangze Shi
Meilin Liu
Yexin Lai
Yongfei Wu
Mingqiang Wei
LLMAG
AAML
108
1
0
23 Sep 2025
MV-UMI: A Scalable Multi-View Interface for Cross-Embodiment Learning
Omar Rayyan
John Abanes
Mahmoud Hafez
Anthony Tzes
Fares Abu-Dakka
102
0
0
23 Sep 2025
Sa2VA-i: Improving Sa2VA Results with Consistent Training and Inference
Alexey Nekrasov
A. Athar
Daan de Geus
Alexander Hermans
Bastian Leibe
172
0
0
23 Sep 2025
The 1st Solution for MOSEv2 Challenge 2025: Long-term and Concept-aware Video Segmentation via SeC
Mingqi Gao
Jingkun Chen
Yunqi Miao
Gengshen Wu
Zhijin Qin
Jungong Han
120
0
0
23 Sep 2025
Previous
1
2
3
4
5
6
...
16
17
18
Next