Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2408.00714
Cited By
SAM 2: Segment Anything in Images and Videos
International Conference on Learning Representations (ICLR), 2024
1 August 2024
Nikhila Ravi
Valentin Gabeur
Yuan-Ting Hu
Ronghang Hu
Chaitanya K. Ryali
Tengyu Ma
Haitham Khedr
Roman Rädle
Chloe Rolland
Laura Gustafson
Eric Mintun
Junting Pan
Kalyan Vasudev Alwala
Nicolas Carion
Chao-Yuan Wu
Ross B. Girshick
Piotr Dollár
Christoph Feichtenhofer
VLM
MLLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (116 upvotes)
Papers citing
"SAM 2: Segment Anything in Images and Videos"
50 / 861 papers shown
Interactive Segmentation and Report Generation for CT Images
International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025
Yannian Gu
Wenhui Lei
Hanyu Chen
Xiaofan Zhang
Shanghang Zhang
210
0
0
05 Mar 2025
ROCKET-2: Steering Visuomotor Policy via Cross-View Goal Alignment
Shaofei Cai
Zhancun Mu
Hoang Trung-Dung
Yitao Liang
296
8
0
04 Mar 2025
One Patient's Annotation is Another One's Initialization: Towards Zero-Shot Surgical Video Segmentation with Cross-Patient Initialization
Seyed Amir Mousavi
Utku Ozbulak
Francesca Tozzi
Nikdokht Rashidian
W. Willaert
J. Vankerschaver
W. D. Neve
198
0
0
04 Mar 2025
Boltzmann Attention Sampling for Image Analysis with Small Objects
Computer Vision and Pattern Recognition (CVPR), 2025
Theodore Zhao
Sid Kiblawi
Naoto Usuyama
Ho Hin Lee
Sam Preston
Hoifung Poon
Mu-Hsin Wei
MedIm
447
2
0
04 Mar 2025
Tracking-Aware Deformation Field Estimation for Non-rigid 3D Reconstruction in Robotic Surgeries
Zeqing Wang
Han Fang
Yihong Xu
Yutong Ban
MedIm
292
1
0
04 Mar 2025
Unveiling the Potential of Segment Anything Model 2 for RGB-Thermal Semantic Segmentation with Language Guidance
Jiayi Zhao
Fei Teng
Kai Luo
Guoqiang Zhao
Hui Yuan
Xu Zheng
Kailun Yang
VLM
362
9
0
04 Mar 2025
WeGen: A Unified Model for Interactive Multimodal Generation as We Chat
Computer Vision and Pattern Recognition (CVPR), 2025
Zhipeng Huang
Shaobin Zhuang
Canmiao Fu
Binxin Yang
Ying Zhang
Chong Sun
Zhizheng Zhang
Yali Wang
Chen Li
Zheng-Jun Zha
DiffM
413
4
0
03 Mar 2025
Autonomous Dissection in Robotic Cholecystectomy
K. Oh
Leonardo Borgioli
Milos Zefran
Valentina Valle
P. Giulianotti
174
1
0
01 Mar 2025
Scalable Real2Sim: Physics-Aware Asset Generation Via Robotic Pick-and-Place Setups
Nicholas Pfaff
Evelyn Fu
Jeremy Binagia
Phillip Isola
Russ Tedrake
352
25
0
01 Mar 2025
The PanAf-FGBG Dataset: Understanding the Impact of Backgrounds in Wildlife Behaviour Recognition
Computer Vision and Pattern Recognition (CVPR), 2025
Otto Brookes
Maksim Kukushkin
Majid Mirmehdi
Colleen Stephens
Paula Dieguez
...
Lukas Boesch
Thomas Schmid
M. Arandjelovic
H. Kühl
T. Burghardt
317
2
0
28 Feb 2025
Revisiting the Evaluation Bias Introduced by Frame Sampling Strategies in Surgical Video Segmentation Using SAM2
Utku Ozbulak
Seyed Amir Mousavi
Francesca Tozzi
Nikdokht Rashidian
W. Willaert
W. D. Neve
J. Vankerschaver
220
1
0
28 Feb 2025
MITracker: Multi-View Integration for Visual Object Tracking
Computer Vision and Pattern Recognition (CVPR), 2025
Mengjie Xu
Yitao Zhu
Haotian Jiang
Jiaming Li
Zhenrong Shen
...
Haolin Huang
Xinyu Wang
Qing Yang
H. Zhang
Qian Wang
278
2
0
27 Feb 2025
Best Foot Forward: Robust Foot Reconstruction in-the-wild
Kyle Fogarty
Jing Yang
Chayan Kumar Patodi
Aadi Bhanti
Aadi Bhanti
Steven Chacko
Cengiz Öztireli
Ujwal Bonde
327
0
0
27 Feb 2025
Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids
Toru Lin
Kartik Sachdev
Linxi Fan
Jitendra Malik
Yuke Zhu
387
47
0
27 Feb 2025
Vector-Quantized Vision Foundation Models for Object-Centric Learning
Rongzhen Zhao
V. Wang
Arno Solin
Joni Pajarinen
OCL
VLM
1.2K
3
0
27 Feb 2025
Deep learning approaches to surgical video segmentation and object detection: A Scoping Review
Devanish N. Kamtam
Joseph B. Shrager
Satya Deepya Malla
Nicole Lin
Juan J. Cardona
Jake J. Kim
Clarence Hu
174
13
0
23 Feb 2025
CAST: Component-Aligned 3D Scene Reconstruction from an RGB Image
ACM Transactions on Graphics (TOG), 2025
Kaixin Yao
Longwen Zhang
Xinhao Yan
Yan Zeng
Qixuan Zhang
Wei Yang
Lan Xu
Jiayuan Gu
Jingyi Yu
421
42
0
18 Feb 2025
SurgPose: a Dataset for Articulated Robotic Surgical Tool Pose Estimation and Tracking
IEEE International Conference on Robotics and Automation (ICRA), 2025
Zijian Wu
Adam Schmidt
Randy Moore
Haoying Zhou
Alexandre Banks
Peter Kazanzides
Septimiu E. Salcudean
287
6
0
17 Feb 2025
Surgical Scene Understanding in the Era of Foundation AI Models: A Comprehensive Review
Ufaq Khan
Umair Nawaz
A. Qayyum
Shazad Ashraf
Yutong Xie
Muhammad Haris Khan
Muhammad Bilal
Junaid Qadir
474
5
0
16 Feb 2025
Video2Policy: Scaling up Manipulation Tasks in Simulation through Internet Videos
Weirui Ye
Fangchen Liu
Z. Ding
Yang Gao
Oleh Rybkin
Pieter Abbeel
VGen
OffRL
401
14
0
14 Feb 2025
Bilevel Learning for Bilevel Planning
Bowen Li
Tom Silver
Sebastian A. Scherer
Alexander G. Gray
627
6
0
12 Feb 2025
ImitDiff: Transferring Foundation-Model Priors for Distraction Robust Visuomotor Policy
IEEE Robotics and Automation Letters (IEEE RA-L), 2025
Yuhang Dong
Haizhou Ge
Yupei Zeng
Jing Zhang
Beiwen Tian
...
Ruixiang Wang
Ruixiang Wang
Ran Yi
Longhua Ma
Longhua Ma
323
1
0
11 Feb 2025
Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance
Li Hu
Guangyuan Wang
Zhen Shen
Xin Gao
Dechao Meng
Lian Zhuo
Peng Zhang
Bang Zhang
Liefeng Bo
DiffM
VGen
427
37
0
10 Feb 2025
Digital Twin Buildings: 3D Modeling, GIS Integration, and Visual Descriptions Using Gaussian Splatting, ChatGPT/Deepseek, and Google Maps Platform
K. Gao
Dening Lu
Liangzhi Li
Nan Chen
Hongjie He
Linlin Xu
Jonathan Li
3DGS
3DPC
AI4CE
527
1
0
09 Feb 2025
PixFoundation: Are We Heading in the Right Direction with Pixel-level Vision Foundation Models?
Mennatullah Siam
VLM
773
3
0
06 Feb 2025
No Free Lunch in Annotation either: An objective evaluation of foundation models for streamlining annotation in animal tracking
IEEE International Symposium on Biomedical Imaging (ISBI), 2025
Emil Mededovic
Valdy Laurentius
Yuli Wu
Marcin Kopaczka
Zhu Chen
Mareike Schulz
René Tolba
Johannes Stegmaier
328
1
0
06 Feb 2025
MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation
Jinbo Xing
Long Mai
Cusuh Ham
Jiahui Huang
Aniruddha Mahapatra
Chi-Wing Fu
T. Wong
Feng Liu
DiffM
VGen
589
27
0
06 Feb 2025
DeblurDiff: Real-World Image Deblurring with Generative Diffusion Models
Lingshun Kong
Jiawei Zhang
Dongqing Zou
Jimmy S. J. Ren
Xiaohe Wu
Jiangxin Dong
Jinshan Pan
DiffM
304
5
0
06 Feb 2025
Towards Physical Understanding in Video Generation: A 3D Point Regularization Approach
Yunuo Chen
Junli Cao
Vidit Goel
Sergei Korolev
Sergei Korolev
Jian Ren
Sergey Tulyakov
Jian Ren
DiffM
VGen
403
8
0
05 Feb 2025
Particle Trajectory Representation Learning with Masked Point Modeling
Sam Young
Yeon-jae Jwa
Kazuhiro Terao
3DPC
344
3
0
04 Feb 2025
Exploring Few-Shot Defect Segmentation in General Industrial Scenarios with Metric Learning and Vision Foundation Models
Tongkun Liu
Bing Li
Xiao Jin
Yupeng Shi
Qiuying Li
Xiang Wei
421
2
0
03 Feb 2025
Not Every Patch is Needed: Towards a More Efficient and Effective Backbone for Video-based Person Re-identification
IEEE Transactions on Image Processing (IEEE TIP), 2025
Lanyun Zhu
Tianrun Chen
Deyi Ji
Jieping Ye
Jing Liu
419
7
0
28 Jan 2025
Efficient Portrait Matte Creation With Layer Diffusion and Connectivity Priors
Zhiyuan Lu
Hao Lu
Hua Huang
937
0
0
28 Jan 2025
MADation: Face Morphing Attack Detection with Foundation Models
Eduarda Caldeira
Guray Ozgur
Tahar Chettaoui
Marija Ivanovska
Peter Peer
Fadi Boutros
Vitomir Štruc
Naser Damer
CVBM
320
10
1
28 Jan 2025
Objects matter: object-centric world models improve reinforcement learning in visually complex environments
Weipu Zhang
Adam Jelley
Trevor A. McInroe
Amos Storkey
OCL
OffRL
157
4
0
27 Jan 2025
MPG-SAM 2: Adapting SAM 2 with Mask Priors and Global Context for Referring Video Object Segmentation
Fu Rong
Meng Lan
Qian Zhang
Guang Dai
VOS
VGen
615
3
0
23 Jan 2025
Deblur-Avatar: Animatable Avatars from Motion-Blurred Monocular Videos
Xianrui Luo
Juewen Peng
Zhongang Cai
Lei Yang
Fan Yang
Zhiguo Cao
Guosheng Lin
VGen
1.3K
2
0
23 Jan 2025
DynamicEarth: How Far are We from Open-Vocabulary Change Detection?
Kaiyu Li
Xiangyong Cao
Yupeng Deng
Chao Pang
Zepeng Xin
Deyu Meng
Zhi Wang
ObjD
322
10
0
22 Jan 2025
Adapting OpenAI's CLIP Model for Few-Shot Image Inspection in Manufacturing Quality Control: An Expository Case Study with Multiple Application Examples
Fadel M. Megahed
Ying-Ju Chen
B. Colosimo
M. Grasso
L. Allison Jones-Farmer
S. Knoth
Hongyue Sun
Inez M. Zwetsloot
AAML
VLM
322
5
0
22 Jan 2025
InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling
Yi Wang
Xinhao Li
Ziang Yan
Yinan He
Jiashuo Yu
...
Kai Chen
Wenhai Wang
Yu Qiao
Yali Wang
Limin Wang
558
121
0
21 Jan 2025
Few-Shot Adaptation of Training-Free Foundation Model for 3D Medical Image Segmentation
Xingxin He
Yifan Hu
Zhaoye Zhou
Mohamed Jarraya
Fang Liu
VLM
MedIm
298
5
0
17 Jan 2025
The Devil is in Temporal Token: High Quality Video Reasoning Segmentation
Computer Vision and Pattern Recognition (CVPR), 2025
Sitong Gong
Yunzhi Zhuge
Lu Zhang
Zhiyong Yang
Pingping Zhang
Huchuan Lu
248
19
0
15 Jan 2025
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
Computer Vision and Pattern Recognition (CVPR), 2025
Miran Heo
Min-Hung Chen
De-An Huang
Sifei Liu
Subhashree Radhakrishnan
Seon Joo Kim
Yu-Chun Wang
Ryo Hachiuma
ObjD
VLM
530
9
0
14 Jan 2025
BlobGEN-Vid: Compositional Text-to-Video Generation with Blob Video Representations
Computer Vision and Pattern Recognition (CVPR), 2025
Weixi Feng
Chao Liu
Sifei Liu
William Yang Wang
Arash Vahdat
Weili Nie
VGen
DiffM
207
11
0
13 Jan 2025
EdgeTAM: On-Device Track Anything Model
Computer Vision and Pattern Recognition (CVPR), 2025
Chong Zhou
Chenchen Zhu
Yunyang Xiong
Saksham Suri
Fanyi Xiao
...
Raghuraman Krishnamoorthi
Bo Dai
Chen Change Loy
Vikas Chandra
Bilge Soran
VLM
313
8
0
13 Jan 2025
Motion Tracks: A Unified Representation for Human-Robot Transfer in Few-Shot Imitation Learning
IEEE International Conference on Robotics and Automation (ICRA), 2025
Juntao Ren
Priya Sundaresan
Dorsa Sadigh
Sanjiban Choudhury
Jeannette Bohg
310
50
0
13 Jan 2025
Static Segmentation by Tracking: A Label-Efficient Approach for Fine-Grained Specimen Image Segmentation
Zhenyang Feng
Zihe Wang
Saul Ibaven Bueno
Saul Ibaven Bueno
Tomasz Frelek
...
Hilmar Lapp
Charles V. Stewart
T. Berger-Wolf
Yu-Chuan Su
Wei-Lun Chao
292
0
0
12 Jan 2025
Zero-shot Shark Tracking and Biometrics from Aerial Imagery
Methods in Ecology and Evolution (MEE), 2025
Chinmay K Lalgudi
Mark E Leone
Jaden V Clark
Sergio Madrigal-Mora
Mario Espinoza
129
4
0
10 Jan 2025
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Haobo Yuan
Xianrui Li
Tao Zhang
Zilong Huang
Shilin Xu
...
Yunhai Tong
Lu Qi
Jiashi Feng
Ming-Hsuan Yang
Ming-Hsuan Yang
VLM
612
68
0
07 Jan 2025
Detection, Retrieval, and Explanation Unified: A Violence Detection System Based on Knowledge Graphs and GAT
Wen-Dong Jiang
Chih-Yung Chang
Diptendu Sinha Roy
532
2
0
07 Jan 2025
Previous
1
2
3
...
14
15
16
17
18
Next
Page 15 of 18
Page
of 18
Go