Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2408.00714
Cited By
SAM 2: Segment Anything in Images and Videos
International Conference on Learning Representations (ICLR), 2024
1 August 2024
Nikhila Ravi
Valentin Gabeur
Yuan-Ting Hu
Ronghang Hu
Chaitanya K. Ryali
Tengyu Ma
Haitham Khedr
Roman Rädle
Chloe Rolland
Laura Gustafson
Eric Mintun
Junting Pan
Kalyan Vasudev Alwala
Nicolas Carion
Chao-Yuan Wu
Ross B. Girshick
Piotr Dollár
Christoph Feichtenhofer
VLM
MLLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (116 upvotes)
Papers citing
"SAM 2: Segment Anything in Images and Videos"
50 / 859 papers shown
Sa2VA-i: Improving Sa2VA Results with Consistent Training and Inference
Alexey Nekrasov
A. Athar
Daan de Geus
Alexander Hermans
Bastian Leibe
172
0
0
23 Sep 2025
StereoFoley: Object-Aware Stereo Audio Generation from Video
Tornike Karchkhadze
Kuan-Lin Chen
Mojtaba
Heydari
Robert Henzel
Alessandro Toso
Mehrez Souden
DiffM
VGen
AuLLM
237
1
0
22 Sep 2025
UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning
Ye Liu
Zongyang Ma
Junfu Pu
Zhongang Qi
Yang Wu
Mingyu Ding
Chang Wen Chen
MLLM
ObjD
LRM
375
2
0
22 Sep 2025
Towards Learning Boulder Excavation with Hydraulic Excavators
Jonas Gruetter
Lorenzo Terenzi
Pascal Egli
Marco Hutter
91
0
0
22 Sep 2025
Learning Geometry-Aware Nonprehensile Pushing and Pulling with Dexterous Hands
Yunshuang Li
Yiyang Ling
Gaurav Sukhatme
Daniel Seita
216
1
0
22 Sep 2025
SAMSON: 3rd Place Solution of LSVOS 2025 VOS Challenge
Yujie Xie
Hongyang Zhang
Zhihui Liu
Shihai Ruan
110
0
0
22 Sep 2025
OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models
Jinshu Chen
Xinghui Li
Xu Bai
Tianxiang Ma
Pengze Zhang
...
Gen Li
Lijie Liu
Songtao Zhao
Bingchuan Li
Qian He
DiffM
VGen
164
1
0
22 Sep 2025
DepTR-MOT: Unveiling the Potential of Depth-Informed Trajectory Refinement for Multi-Object Tracking
Buyin Deng
Lingxin Huang
Kai Luo
Fei Teng
Kailun Yang
VOT
263
1
0
22 Sep 2025
MRN: Harnessing 2D Vision Foundation Models for Diagnosing Parkinson's Disease with Limited 3D MR Data
Ding Shaodong
Liu Ziyang
Zhou Yijun
Liu Tao
112
0
0
22 Sep 2025
Language-in-the-Loop Culvert Inspection on the Erie Canal
Yashom Dighe
Yash Turkar
Karthik Dantu
94
0
0
22 Sep 2025
VAInpaint: Zero-Shot Video-Audio inpainting framework with LLMs-driven Module
Kam Man Wu
Zeyue Tian
Liya Ji
Qifeng Chen
VGen
115
0
0
21 Sep 2025
History-Aware Visuomotor Policy Learning via Point Tracking
Jingjing Chen
Hongjie Fang
Chenxi Wang
Shiquan Wang
Cewu Lu
153
2
0
21 Sep 2025
The 1st Solution for 7th LSVOS RVOS Track: SaSaSa2VA
Quanzhu Niu
Dengxian Gong
Shihao Chen
Tao Zhang
Yikang Zhou
Haobo Yuan
Lu Qi
Xiangtai Li
Shilin Xu
VOS
283
0
0
21 Sep 2025
RLGF: Reinforcement Learning with Geometric Feedback for Autonomous Driving Video Generation
Tianyi Yan
Wencheng Han
Xia Zhou
Xueyang Zhang
Kun Zhan
Cheng-Zhong Xu
Jianbing Shen
EGVM
VGen
250
4
0
20 Sep 2025
Enriched Feature Representation and Motion Prediction Module for MOSEv2 Track of 7th LSVOS Challenge: 3rd Place Solution
Chang Soo Lim
Joonyoung Moon
Donghyeon Cho
96
0
0
19 Sep 2025
Neural Atlas Graphs for Dynamic Scene Decomposition and Editing
Jan Philipp Schneider
Pratik Singh Bisht
Ilya Chugunov
A. Kolb
Michael Moeller
Felix Heide
201
1
0
19 Sep 2025
Qianfan-VL: Domain-Enhanced Universal Vision-Language Models
Daxiang Dong
Mingming Zheng
Dong Xu
Bairong Zhuang
W. Zhang
...
Ruchang Yao
Ziye Yuan
J. Wu
Guangjun Xie
Dou Shen
VLM
96
1
0
19 Sep 2025
Sparse Multiview Open-Vocabulary 3D Detection
Olivier Moliner
Viktor Larsson
Kalle Åström
116
0
0
19 Sep 2025
ENSAM: an efficient foundation model for interactive segmentation of 3D medical images
Elias Stenhede
Agnar Martin Bjørnstad
Arian Ranjbar
MedIm
103
1
0
19 Sep 2025
ORB: Operating Room Bot, Automating Operating Room Logistics through Mobile Manipulation
Jinkai Qiu
Yungjun Kim
Gaurav Sethia
Tanmay Agarwal
Siddharth Ghodasara
Zackory Erickson
Jeffrey Ichnowski
99
0
0
19 Sep 2025
RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation
Yuming Jiang
Siteng Huang
Shengke Xue
Yaxi Zhao
Jun Cen
...
Kexiang Wang
Mingxiu Chen
F. Wang
Deli Zhao
Xin Li
VGen
LM&Ro
91
8
0
18 Sep 2025
DACoN: DINO for Anime Paint Bucket Colorization with Any Number of Reference Images
Kazuma Nagata
Naoshi Kaneko
DiffM
216
0
0
18 Sep 2025
Unleashing the Potential of Multimodal LLMs for Zero-Shot Spatio-Temporal Video Grounding
Zaiquan Yang
Yuhao Liu
Gerhard Hancke
Rynson W. H. Lau
AI4TS
137
2
0
18 Sep 2025
Pseudo-Label Enhanced Cascaded Framework: 2nd Technical Report for LSVOS 2025 VOS Track
An Yan
Leilei Cao
Feng Lu
Ran Hong
Youhai Jiang
Fengjie Zhu
140
0
0
18 Sep 2025
Wan-Animate: Unified Character Animation and Replacement with Holistic Replication
Gang Cheng
X. Gao
Li Hu
Siqi Hu
Mingyang Huang
...
Peng Zhang
Xindi Zhang
Zhe Zhang
Jingren Zhou
Lian Zhuo
VGen
238
15
0
17 Sep 2025
Reinforcement Learning for Robotic Insertion of Flexible Cables in Industrial Settings
Jeongwoo Park
Seabin Lee
Changmin Park
Wonjong Lee
Changjoo Nam
108
0
0
17 Sep 2025
Controllable-Continuous Color Editing in Diffusion Model via Color Mapping
Yuqi Yang
Dongliang Chang
Yuanchen Fang
Yi-Zhe Song
Zhanyu Ma
Jun Guo
DiffM
KELM
148
0
0
17 Sep 2025
4DRadar-GS: Self-Supervised Dynamic Driving Scene Reconstruction with 4D Radar
Xiao Tang
Guirong Zhuo
Cong Wang
Boyuan Zheng
Minqing Huang
Lianqing Zheng
Long Chen
Shouyi Lu
3DGS
152
0
0
16 Sep 2025
Road Obstacle Video Segmentation
Shyam Nandan Rai
Shyamgopal Karthik
Mariana-Iuliana Georgescu
Barbara Caputo
Carlo Masone
Zeynep Akata
VOS
217
0
0
16 Sep 2025
IMD: A 6-DoF Pose Estimation Benchmark for Industrial Metallic Objects
Ruimin Ma
Sebastian Zudaire
Zhen Li
Chi Zhang
153
0
0
15 Sep 2025
AssemMate: Graph-Based LLM for Robotic Assembly Assistance
Qi Zheng
Chaoran Zhang
Zijian Liang
Ente Lin
Shubo Cui
Qinghongbing Xie
Zhaobo Xu
Long Zeng
141
0
0
15 Sep 2025
BREA-Depth: Bronchoscopy Realistic Airway-geometric Depth Estimation
International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025
Francis Xiatian Zhang
Emile Mackute
Mohammadreza Kasaei
Kevin Dhaliwal
Robert Thomson
Mohsen Khadem
134
2
0
15 Sep 2025
FS-SAM2: Adapting Segment Anything Model 2 for Few-Shot Semantic Segmentation via Low-Rank Adaptation
Bernardo Forni
Gabriele Lombardi
Federico Pozzi
Mirco Planamente
VLM
123
0
0
15 Sep 2025
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
Y. Zhou
Yifan Wang
Jianjun Zhou
Wenzheng Chang
Haoyu Guo
...
Junyi Chen
Chunhua Shen
Jiangmiao Pang
Kaipeng Zhang
Tong He
VGen
283
5
0
15 Sep 2025
From Pixels to Shelf: End-to-End Algorithmic Control of a Mobile Manipulator for Supermarket Stocking and Fronting
Davide Peron
Victor Nan Fernandez-Ayala
Lukas Segelmark
69
0
0
15 Sep 2025
U-Mamba2: Scaling State Space Models for Dental Anatomy Segmentation in CBCT
Zhi Qin Tan
Xiatian Zhu
Owen Addison
Yunpeng Li
Mamba
AI4CE
222
1
0
15 Sep 2025
Towards Understanding Visual Grounding in Visual Language Models
Georgios Pantazopoulos
Eda B. Özyiğit
ObjD
314
3
0
12 Sep 2025
T2Bs: Text-to-Character Blendshapes via Video Generation
Jiahao Luo
Chaoyang Wang
Michael Vasilkovsky
V. Shakhrai
Di Liu
...
Sergey Tulyakov
Peter Wonka
Hsin-Ying Lee
James Davis
Jian Wang
DiffM
222
1
0
12 Sep 2025
Multimodal SAM-adapter for Semantic Segmentation
IEEE Access (IEEE Access), 2025
Iacopo Curti
Pierluigi Zama Ramirez
Alioscia Petrelli
Luigi Di Stefano
137
1
0
12 Sep 2025
SegSLR: Promptable Video Segmentation for Isolated Sign Language Recognition
Sven Schreiber
Noha Sarhan
Simone Frintrop
Christian Wilms
SLR
VLM
247
0
0
12 Sep 2025
Segment Anything for Cell Tracking
Zhu Chen
Mert Edgü
Er Jin
Johannes Stegmaier
96
0
0
12 Sep 2025
Stable Part Diffusion 4D: Multi-View RGB and Kinematic Parts Video Generation
Hao Zhang
Chun-Han Yao
Simon Donné
Narendra Ahuja
Varun Jampani
VGen
483
2
0
12 Sep 2025
PeftCD: Leveraging Vision Foundation Models with Parameter-Efficient Fine-Tuning for Remote Sensing Change Detection
Sijun Dong
Yuxuan Hu
Libo Wang
Geng Chen
Xiaoliang Meng
124
1
0
11 Sep 2025
ObjectReact: Learning Object-Relative Control for Visual Navigation
Sourav Garg
Dustin Craggs
Vineeth Bhat
Lachlan Mares
Stefan Podgorski
Madhava Krishna
Feras Dayoub
Ian Reid
138
1
0
11 Sep 2025
SpatialVID: A Large-Scale Video Dataset with Spatial Annotations
Jiahao Wang
Yufeng Yuan
Rujie Zheng
Youtian Lin
Jian Gao
...
Xiaoxiao Long
Hao Zhu
Z. Zhang
X. Cao
Yao Yao
VGen
338
11
0
11 Sep 2025
Calib3R: A 3D Foundation Model for Multi-Camera to Robot Calibration and 3D Metric-Scaled Scene Reconstruction
Davide Allegro
Matteo Terreran
Stefano Ghidoni
120
0
0
10 Sep 2025
Live(r) Die: Predicting Survival in Colorectal Liver Metastasis
Muhammad Alberb
H. Cheung
Anne L. Martel
96
0
0
10 Sep 2025
SAFT: Shape and Appearance of Fabrics from Template via Differentiable Physical Simulations from Monocular Video
David Stotko
Reinhard Klein
3DH
125
0
0
10 Sep 2025
MVAT: Multi-View Aware Teacher for Weakly Supervised 3D Object Detection
Saad Lahlali
Alexandre Fournier-Montgieux
Nicolas Granger
Hervé Le Borgne
Quoc-Cuong Pham
3DPC
123
0
0
09 Sep 2025
WS
2
^2
2
: Weakly Supervised Segmentation using Before-After Supervision in Waste Sorting
Andrea Marelli
Alberto Foresti
Leonardo Pesce
Giacomo Boracchi
Mario Grosso
116
0
0
08 Sep 2025
Previous
1
2
3
...
5
6
7
...
16
17
18
Next