Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2408.00714
Cited By
SAM 2: Segment Anything in Images and Videos
International Conference on Learning Representations (ICLR), 2024
1 August 2024
Nikhila Ravi
Valentin Gabeur
Yuan-Ting Hu
Ronghang Hu
Chaitanya K. Ryali
Tengyu Ma
Haitham Khedr
Roman Rädle
Chloe Rolland
Laura Gustafson
Eric Mintun
Junting Pan
Kalyan Vasudev Alwala
Nicolas Carion
Chao-Yuan Wu
Ross B. Girshick
Piotr Dollár
Christoph Feichtenhofer
VLM
MLLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (116 upvotes)
Papers citing
"SAM 2: Segment Anything in Images and Videos"
50 / 813 papers shown
Title
ReSAM: Refine, Requery, and Reinforce: Self-Prompting Point-Supervised Segmentation for Remote Sensing Images
M.Naseer Subhani
103
0
0
26 Nov 2025
Motion Marionette: Rethinking Rigid Motion Transfer via Prior Guidance
Haoxuan Wang
Jiachen Tao
Junyi Wu
Gaowen Liu
Ramana Rao Kompella
Yan Yan
VGen
109
0
0
25 Nov 2025
Zoo3D: Zero-Shot 3D Object Detection at Scene Level
Andrey Lemeshko
Bulat Gabdullin
Nikita Drozdov
Anton Konushin
D. Rukhovich
Maksim Kolodiazhnyi
3DPC
ObjD
VLM
274
0
0
25 Nov 2025
The Image as Its Own Reward: Reinforcement Learning with Adversarial Reward for Image Generation
Weijia Mao
Hao Chen
Zhenheng Yang
Mike Zheng Shou
EGVM
176
0
0
25 Nov 2025
SAM-MI: A Mask-Injected Framework for Enhancing Open-Vocabulary Semantic Segmentation with SAM
Lin Chen
Yingjian Zhu
Qi Yang
Xin Niu
Kun Ding
Shiming Xiang
VLM
81
0
0
25 Nov 2025
RADSeg: Unleashing Parameter and Compute Efficient Zero-Shot Open-Vocabulary Segmentation Using Agglomerative Models
Omar Alama
Darshil Jariwala
A. Bhattacharya
Seungchan Kim
Wenshan Wang
Sebastian A. Scherer
VLM
48
0
0
24 Nov 2025
IDEAL-M3D: Instance Diversity-Enriched Active Learning for Monocular 3D Detection
Johannes Meier
Florian Günther
Riccardo Marin
Oussema Dhaouadi
Jacques Kaiser
Daniel Cremers
72
0
0
24 Nov 2025
MedSAM3: Delving into Segment Anything with Medical Concepts
Anglin Liu
Rundong Xue
Xu Cao
Yifan Shen
Yi Lu
Xiang Li
Qianqian Chen
Jintai Chen
MedIm
VLM
144
0
0
24 Nov 2025
Vision-Language Enhanced Foundation Model for Semi-supervised Medical Image Segmentation
Jiaqi Guo
Mingzhen Li
Hanyu Su
Santiago López
Lexiaozi Fan
Daniel Kim
Aggelos K. Katsaggelos
VLM
164
0
0
24 Nov 2025
IDSplat: Instance-Decomposed 3D Gaussian Splatting for Driving Scenes
Carl Lindström
Mahan Rafidashti
M. Fatemi
Lars Hammarstrand
Martin R. Oswald
Lennart Svensson
3DGS
106
0
0
24 Nov 2025
Re-Key-Free, Risky-Free: Adaptable Model Usage Control
Zihan Wang
Zhongkui Ma
Xinguo Feng
Chuan Yan
Dongge Liu
Ruoxi Sun
Derui Wang
Minhui Xue
Guangdong Bai
AAML
104
0
0
24 Nov 2025
LAST: LeArning to Think in Space and Time for Generalist Vision-Language Models
Shuai Wang
D. Zhang
Tianyi Bai
Shitong Shao
Jiebo Luo
Jiaheng Wei
VLM
96
0
0
24 Nov 2025
NI-Tex: Non-isometric Image-based Garment Texture Generation
Hui Shan
Ming Li
Haitao Yang
Kai Zheng
Sizhe Zheng
Yanwei Fu
Xiangru Huang
3DH
176
0
0
24 Nov 2025
CataractCompDetect: Intraoperative Complication Detection in Cataract Surgery
Bhuvan Sachdeva
Sneha Kumari
Rudransh Agarwal
Shalaka Kumaraswamy
Niharika Singri Prasad
...
Raphael Lechtenboehmer
M. Wintergerst
T. Schultz
K. Murali
Mohit Jain
48
0
0
24 Nov 2025
One Attention, One Scale: Phase-Aligned Rotary Positional Embeddings for Mixed-Resolution Diffusion Transformer
Haoyu Wu
Jingyi Xu
Qiaomu Miao
Dimitris Samaras
H. Le
48
0
0
24 Nov 2025
Ref-SAM3D: Bridging SAM3D with Text for Reference 3D Reconstruction
Yun Zhou
Yaoting Wang
Guangquan Jie
Jinyu Liu
Henghui Ding
44
0
0
24 Nov 2025
SatSAM2: Motion-Constrained Video Object Tracking in Satellite Imagery using Promptable SAM2 and Kalman Priors
Ruijie Fan
Junyan Ye
Huan Chen
Z. Huang
Xiaolei Wang
Weijia Li
68
0
0
23 Nov 2025
MASS: Motion-Aware Spatial-Temporal Grounding for Physics Reasoning and Comprehension in Vision-Language Models
Xiyang Wu
Zongxia Li
Jihui Jin
Guangyao Shi
Gouthaman KV
Vishnu Raj
Nilotpal Sinha
Jingxi Chen
Fan Du
Dinesh Manocha
60
0
0
23 Nov 2025
Uncertainty Quantification in HSI Reconstruction using Physics-Aware Diffusion Priors and Optics-Encoded Measurements
J. Romero
Qiang Fu
M. Ravasi
W. Heidrich
DiffM
113
0
0
23 Nov 2025
AFT: Appearance-Based Feature Tracking for Markerless and Training-Free Shape Reconstruction of Soft Robots
IEEE Robotics and Automation Letters (IEEE RA-L), 2025
Shangyuan Yuan
Preston Fairchild
Yu Mei
Xinyu Zhou
Xiaobo Tan
3DV
106
0
0
22 Nov 2025
Not Quite Anything: Overcoming SAMs Limitations for 3D Medical Imaging
Keith Moore
MedIm
118
0
0
22 Nov 2025
RoboArmGS: High-Quality Robotic Arm Splatting via Bézier Curve Refinement
Hao Wang
Xiaobao Wei
Ying Li
Qingpo Wuwu
Dongli Wu
Jiajun Cao
Ming Lu
Wenzhao Zheng
Shanghang Zhang
44
0
0
22 Nov 2025
Stable Offline Hand-Eye Calibration for any Robot with Just One Mark
Sicheng Xie
Lingchen Meng
Zhiying Du
Shuyuan Tu
Haidong Cao
Jiaqi Leng
Z. F. Wu
Yu-Gang Jiang
136
0
0
21 Nov 2025
SPEAR-1: Scaling Beyond Robot Demonstrations via 3D Understanding
Nikolay Nikolov
Giuliano Albanese
Sombit Dey
Aleksandar Yanev
Luc Van Gool
Jan-Nico Zaech
D. Paudel
LM&Ro
232
0
0
21 Nov 2025
Illustrator's Depth: Monocular Layer Index Prediction for Image Decomposition
Nissim Maruani
Peiying Zhang
Siddhartha Chaudhuri
Matthew Fisher
Nanxuan Zhao
Vladimir G. Kim
Pierre Alliez
Mathieu Desbrun
Wang Yifan
MDE
174
0
0
21 Nov 2025
Controllable Layer Decomposition for Reversible Multi-Layer Image Generation
Zihao Liu
Zunnan Xu
Shi Shu
Jun Zhou
Ruicheng Zhang
Zhenchao Tang
Xiu Li
166
0
0
20 Nov 2025
Flow and Depth Assisted Video Prediction with Latent Transformer
Eliyas Suleyman
Paul Henderson
Eksan Firkat
Nicolas Pugeault
74
0
0
20 Nov 2025
SAM2S: Segment Anything in Surgical Videos via Semantic Long-term Tracking
Haofeng Liu
Ziyue Wang
Sudhanshu Mishra
Mingqi Gao
Guanyi Qin
Chang Han Low
Alex Y. W. Kong
Yueming Jin
VLM
56
1
0
20 Nov 2025
SVG360: Multi-View SVG Generation with Geometric and Color Consistency from a Single SVG
Mengnan Jiang
Zhaolin Sun
Christian Franke
Michele Franco Adesso
Antonio Haas
Grace Li Zhang
3DGS
124
0
0
20 Nov 2025
Click2Graph: Interactive Panoptic Video Scene Graphs from a Single Click
Raphael Ruschel
Hardikkumar Prajapati
Awsafur Rahman
B. S. Manjunath
192
0
0
20 Nov 2025
EfficientSAM3: Progressive Hierarchical Distillation for Video Concept Segmentation from SAM1, 2, and 3
Chengxi Zeng
Yuxuan Jiang
Aaron Zhang
VLM
241
1
0
19 Nov 2025
WALDO: Where Unseen Model-based 6D Pose Estimation Meets Occlusion
Sajjad Pakdamansavoji
Yintao Ma
Amir Rasouli
Tongtong Cao
32
0
0
19 Nov 2025
The SA-FARI Dataset: Segment Anything in Footage of Animals for Recognition and Identification
Dante Francisco Wasmuht
Otto Brookes
Maximillian Schall
Pablo Palencia
Chris Beirne
...
Yuan-Ting Hu
Baishan Guo
Andrew Westbury
Kate Saenko
Didac Suris
153
0
0
19 Nov 2025
Deep Learning for Accurate Vision-based Catch Composition in Tropical Tuna Purse Seiners
Xabier Lekunberri
Ahmad Kamal
Izaro Goienetxea
Jon Ruiz
Iñaki Quincoces
Jaime Valls Miro
Ignacio Arganda-Carreras
Jose A. Fernandes-Salvador
100
0
0
19 Nov 2025
Aerial Assistance System for Automated Firefighting during Turntable Ladder Operations
Jan Quenzel
Valerij Sekin
Daniel Schleich
Alexander Miller
Merlin Stampa
Norbert Pahlke
Christof Röhrig
Sven Behnke
56
0
0
18 Nov 2025
Unlocking the Forgery Detection Potential of Vanilla MLLMs: A Novel Training-Free Pipeline
Rui Zuo
Qinyue Tong
Zhe-ming Lu
Ziqian Lu
87
0
0
17 Nov 2025
Training-Free Multi-View Extension of IC-Light for Textual Position-Aware Scene Relighting
Jiangnan Ye
Jiedong Zhuang
Lianrui Mu
Wenjie Zheng
Jiaqi Hu
Xingze Zou
Jing Wang
Haoji Hu
3DGS
112
0
0
17 Nov 2025
Medal S: Spatio-Textual Prompt Model for Medical Segmentation
Pengcheng Shi
Jiawei Chen
Jiaqi Liu
Xinglin Zhang
Tao Chen
Lei Li
MedIm
VLM
196
0
0
17 Nov 2025
C3Net: Context-Contrast Network for Camouflaged Object Detection
Baber Jan
Aiman El-Maleh
Abdul Jabbar Siddiqui
Abdul Bais
Saeed Anwar
42
0
0
16 Nov 2025
ActiveGrasp: Information-Guided Active Grasping with Calibrated Energy-based Model
Boshu Lei
Wen Jiang
Kostas Daniilidis
92
0
0
16 Nov 2025
RoboAfford++: A Generative AI-Enhanced Dataset for Multimodal Affordance Learning in Robotic Manipulation and Navigation
Xiaoshuai Hao
Yingbo Tang
Lingfeng Zhang
Yanbiao Ma
Yunfeng Diao
Ziyu Jia
Wenbo Ding
Hangjun Ye
L. Chen
LM&Ro
165
0
0
16 Nov 2025
Reasoning Text-to-Video Retrieval via Digital Twin Video Representations and Large Language Models
Yiqing Shen
Chenxiao Fan
Chenjia Li
Mathias Unberath
VGen
LRM
126
0
0
15 Nov 2025
Constructing and Interpreting Digital Twin Representations for Visual Reasoning via Reinforcement Learning
Yiqing Shen
Mathias Unberath
LRM
57
0
0
15 Nov 2025
Fast Reasoning Segmentation for Images and Videos
Yiqing Shen
Mathias Unberath
VLM
LRM
76
0
0
15 Nov 2025
Changes in Real Time: Online Scene Change Detection with Multi-View Fusion
Chamuditha Jayanga Galappaththige
Jason Lai
Lloyd Windrim
D. Dansereau
Niko Sünderhauf
Dimity Miller
3DPC
172
0
0
15 Nov 2025
EgoEMS: A High-Fidelity Multimodal Egocentric Dataset for Cognitive Assistance in Emergency Medical Services
Keshara Weerasinghe
Xueren Ge
Tessa Heick
L. Wijayasingha
Anthony Cortez
Abhishek Satpathy
John A. Stankovic
H. Alemzadeh
130
0
0
13 Nov 2025
PriVi: Towards A General-Purpose Video Model For Primate Behavior In The Wild
Felix B. Mueller
Jan F. Meier
Timo Lueddecke
Richard Vogg
Roger L. Freixanet
...
Liran Samuni
Oliver Schülke
Neda Shahidi
Erin G. Wessling
Alexander S. Ecker
111
0
0
12 Nov 2025
UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist
Z. Liang
D. Zhang
Huichi Zhou
Rui Huang
Bobo Li
...
Shengqiong Wu
X. Wang
Jiebo Luo
Lizi Liao
Hao Fei
VGen
145
0
0
11 Nov 2025
Gaussian-Augmented Physics Simulation and System Identification with Complex Colliders
Federico Vasile
Ri-Zhao Qiu
Lorenzo Natale
Xiaolong Wang
100
0
0
10 Nov 2025
TwinOR: Photorealistic Digital Twins of Dynamic Operating Rooms for Embodied AI Research
Han-shen Zhang
Yiqing Shen
R. Soberanis-Mukul
Ankita Ghosh
Hao Ding
...
Wenjie Xiao
Lonny Yarmus
Angela Christine Argento
Masaru Ishii
Mathias Unberath
155
0
0
10 Nov 2025
1
2
3
4
...
15
16
17
Next