SAM 2: Segment Anything in Images and Videos

International Conference on Learning Representations (ICLR), 2024

1 August 2024

Roman Rädle

Kalyan Vasudev Alwala

Nicolas Carion

Chao-Yuan Wu

Ross B. Girshick

Piotr Dollár

Christoph Feichtenhofer

VLM

MLLM

ArXiv (abs)PDF HTML HuggingFace (116 upvotes)

Papers citing "SAM 2: Segment Anything in Images and Videos"

50 / 859 papers shown

Sa2VA-i: Improving Sa2VA Results with Consistent Training and Inference

172

23 Sep 2025

StereoFoley: Object-Aware Stereo Audio Generation from Video

237

22 Sep 2025

UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning

375

22 Sep 2025

Towards Learning Boulder Excavation with Hydraulic Excavators

22 Sep 2025

Learning Geometry-Aware Nonprehensile Pushing and Pulling with Dexterous Hands

216

22 Sep 2025

SAMSON: 3rd Place Solution of LSVOS 2025 VOS Challenge

110

22 Sep 2025

OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models

...

164

22 Sep 2025

DepTR-MOT: Unveiling the Potential of Depth-Informed Trajectory Refinement for Multi-Object Tracking

263

22 Sep 2025

MRN: Harnessing 2D Vision Foundation Models for Diagnosing Parkinson's Disease with Limited 3D MR Data

112

22 Sep 2025

Language-in-the-Loop Culvert Inspection on the Erie Canal

Yashom Dighe

Yash Turkar

Karthik Dantu

22 Sep 2025

VAInpaint: Zero-Shot Video-Audio inpainting framework with LLMs-driven Module

115

21 Sep 2025

History-Aware Visuomotor Policy Learning via Point Tracking

153

21 Sep 2025

The 1st Solution for 7th LSVOS RVOS Track: SaSaSa2VA

283

21 Sep 2025

RLGF: Reinforcement Learning with Geometric Feedback for Autonomous Driving Video Generation

250

20 Sep 2025

Enriched Feature Representation and Motion Prediction Module for MOSEv2 Track of 7th LSVOS Challenge: 3rd Place Solution

Chang Soo Lim

Joonyoung Moon

Donghyeon Cho

19 Sep 2025

Neural Atlas Graphs for Dynamic Scene Decomposition and Editing

Jan Philipp Schneider

201

19 Sep 2025

Qianfan-VL: Domain-Enhanced Universal Vision-Language Models

...

19 Sep 2025

Sparse Multiview Open-Vocabulary 3D Detection

Olivier Moliner

Viktor Larsson

Kalle Åström

116

19 Sep 2025

ENSAM: an efficient foundation model for interactive segmentation of 3D medical images

Elias Stenhede

Agnar Martin Bjørnstad

Arian Ranjbar

MedIm

103

19 Sep 2025

ORB: Operating Room Bot, Automating Operating Room Logistics through Mobile Manipulation

19 Sep 2025

RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation

...

18 Sep 2025

DACoN: DINO for Anime Paint Bucket Colorization with Any Number of Reference Images

Kazuma Nagata

Naoshi Kaneko

DiffM

216

18 Sep 2025

Unleashing the Potential of Multimodal LLMs for Zero-Shot Spatio-Temporal Video Grounding

137

18 Sep 2025

Pseudo-Label Enhanced Cascaded Framework: 2nd Technical Report for LSVOS 2025 VOS Track

140

18 Sep 2025

Wan-Animate: Unified Character Animation and Replacement with Holistic Replication

...

238

17 Sep 2025

Reinforcement Learning for Robotic Insertion of Flexible Cables in Industrial Settings

108

17 Sep 2025

Controllable-Continuous Color Editing in Diffusion Model via Color Mapping

148

17 Sep 2025

4DRadar-GS: Self-Supervised Dynamic Driving Scene Reconstruction with 4D Radar

152

16 Sep 2025

Road Obstacle Video Segmentation

Shyam Nandan Rai

Shyamgopal Karthik

Mariana-Iuliana Georgescu

217

16 Sep 2025

IMD: A 6-DoF Pose Estimation Benchmark for Industrial Metallic Objects

153

15 Sep 2025

AssemMate: Graph-Based LLM for Robotic Assembly Assistance

141

15 Sep 2025

BREA-Depth: Bronchoscopy Realistic Airway-geometric Depth EstimationInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025

Francis Xiatian Zhang

134

15 Sep 2025

FS-SAM2: Adapting Segment Anything Model 2 for Few-Shot Semantic Segmentation via Low-Rank Adaptation

123

15 Sep 2025

OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling

...

283

15 Sep 2025

From Pixels to Shelf: End-to-End Algorithmic Control of a Mobile Manipulator for Supermarket Stocking and Fronting

Davide Peron

Victor Nan Fernandez-Ayala

Lukas Segelmark

15 Sep 2025

U-Mamba2: Scaling State Space Models for Dental Anatomy Segmentation in CBCT

222

15 Sep 2025

Towards Understanding Visual Grounding in Visual Language Models

Georgios Pantazopoulos

Eda B. Özyiğit

ObjD

314

12 Sep 2025

T2Bs: Text-to-Character Blendshapes via Video Generation

...

222

12 Sep 2025

Multimodal SAM-adapter for Semantic SegmentationIEEE Access (IEEE Access), 2025

Iacopo Curti

Pierluigi Zama Ramirez

Alioscia Petrelli

Luigi Di Stefano

137

12 Sep 2025

SegSLR: Promptable Video Segmentation for Isolated Sign Language Recognition

247

12 Sep 2025

Segment Anything for Cell Tracking

12 Sep 2025

Stable Part Diffusion 4D: Multi-View RGB and Kinematic Parts Video Generation

483

12 Sep 2025

PeftCD: Leveraging Vision Foundation Models with Parameter-Efficient Fine-Tuning for Remote Sensing Change Detection

124

11 Sep 2025

ObjectReact: Learning Object-Relative Control for Visual Navigation

138

11 Sep 2025

SpatialVID: A Large-Scale Video Dataset with Spatial Annotations

...

338

11 Sep 2025

Calib3R: A 3D Foundation Model for Multi-Camera to Robot Calibration and 3D Metric-Scaled Scene Reconstruction

Davide Allegro

Matteo Terreran

Stefano Ghidoni

120

10 Sep 2025

Live(r) Die: Predicting Survival in Colorectal Liver Metastasis

Muhammad Alberb

H. Cheung

Anne L. Martel

10 Sep 2025

SAFT: Shape and Appearance of Fabrics from Template via Differentiable Physical Simulations from Monocular Video

David Stotko

Reinhard Klein

3DH

125

10 Sep 2025

MVAT: Multi-View Aware Teacher for Weakly Supervised 3D Object Detection

Saad Lahlali

Alexandre Fournier-Montgieux

123

09 Sep 2025

^2

: Weakly Supervised Segmentation using Before-After Supervision in Waste Sorting

116

08 Sep 2025