SAM 2: Segment Anything in Images and Videos

International Conference on Learning Representations (ICLR), 2024

1 August 2024

Roman Rädle

Kalyan Vasudev Alwala

Nicolas Carion

Chao-Yuan Wu

Ross B. Girshick

Piotr Dollár

Christoph Feichtenhofer

VLM

MLLM

ArXiv (abs)PDF HTML HuggingFace (116 upvotes)

Papers citing "SAM 2: Segment Anything in Images and Videos"

50 / 863 papers shown

Adaptive Articulated Object Manipulation On The Fly with Foundation Model Reasoning and Part Grounding

164

24 Jul 2025

Moving Object Detection from Moving Camera Using Focus of Expansion Likelihood and Segmentation

Masahiro Ogawa

Qi An

Atsushi Yamashita

155

18 Jul 2025

Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation

...

214

15 Jul 2025

From Wardrobe to Canvas: Wardrobe Polyptych LoRA for Part-level Controllable Human Image Generation

293

14 Jul 2025

Visuo-Acoustic Hand Pose and Contact Estimation

150

13 Jul 2025

From One to More: Contextual Part Latents for 3D Generation

...

267

11 Jul 2025

HiM2SAM: Enhancing SAM2 with Hierarchical Motion Estimation and Memory Optimization towards Long-term Tracking

370

10 Jul 2025

OTAS: Open-vocabulary Token Alignment for Outdoor Segmentation

Simon Schwaiger

Stefan Thalhammer

Wilfried Wöber

Gerald Steinbauer-Wagner

169

08 Jul 2025

Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion

314

08 Jul 2025

OpenWorldSAM: Extending SAM2 for Universal Image Segmentation with Language Prompts

315

07 Jul 2025

ZERO: Industry-ready Vision Foundation Model with Multi-modal Prompts

234

06 Jul 2025

Foundation versus Domain-specific Models: Performance Comparison, Fusion, and Explainability in Face Recognition

270

04 Jul 2025

The Sound of Simulation: Learning Multimodal Sim-to-Real Robot Policies with Generative Audio

Gopala Anumanchipalli

234

03 Jul 2025

SIU3R: Simultaneous Scene Understanding and 3D Reconstruction Beyond Feature Alignment

283

03 Jul 2025

NOCTIS: Novel Object Cyclic Threshold based Instance Segmentation

Max Gandyra

Alessandro Santonicola

Michael Beetz

261

02 Jul 2025

Reasoning to Edit: Hypothetical Instruction-Based Image Editing with Visual Reasoning

213

02 Jul 2025

LatentMove: Towards Complex Human Movement Video Generation

301

01 Jul 2025

Geological Everything Model 3D: A Promptable Foundation Model for Unified and Zero-Shot Subsurface Understanding

242

01 Jul 2025

Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement

393

01 Jul 2025

SurgiSR4K: A High-Resolution Endoscopic Video Dataset for Robotic-Assisted Minimally Invasive Procedures

...

181

30 Jun 2025

SCORP: Scene-Consistent Object Refinement via Proxy Generation and Tuning

206

30 Jun 2025

Grounding DINO-US-SAM: Text-Prompted Multi-Organ Segmentation in Ultrasound with LoRA-Tuned Vision-Language ModelsIEEE Transactions on Ultrasonics, Ferroelectrics and Frequency Control (IEEE TUFFC), 2025

Hamza Rasaee

Taha Koleilat

H. Rivaz

224

30 Jun 2025

Foundation Models for Zero-Shot Segmentation of Scientific Images without AI-Ready Data

Shubhabrata Mukherjee

168

30 Jun 2025

OmniVCus: Feedforward Subject-driven Video Customization with Multimodal Control Conditions

...

297

29 Jun 2025

ProSAM: Enhancing the Robustness of SAM-based Visual Reference Segmentation with Probabilistic Prompts

268

27 Jun 2025

MultiHuman-Testbench: Benchmarking Image Generation for Multiple Humans

280

25 Jun 2025

Evaluating the Robustness of Open-Source Vision-Language Models to Domain Shift in Object Captioning

Federico Tavella

Amber Drinkwater

Angelo Cangelosi

24 Jun 2025

OmniGen2: Exploration to Advanced Multimodal Generation

...

333

173

23 Jun 2025

RGBTrack: Fast, Robust Depth-Free 6D Pose Estimation and Tracking

Teng Guo

Jingjin Yu

3DPC 3DV

233

20 Jun 2025

Co-Seg++: Mutual Prompt-Guided Collaborative Learning for Versatile Medical Segmentation

275

20 Jun 2025

DIGMAPPER: A Modular System for Automated Geologic Map Digitization

...

153

19 Jun 2025

ControlVLA: Few-shot Object-centric Adaptation for Pre-trained Vision-Language-Action Models

...

212

19 Jun 2025

NTIRE 2025 Image Shadow Removal Challenge Report

Florin-Alexandru Vasluianu

...

218

18 Jun 2025

SynPo: Boosting Training-Free Few-Shot Medical Segmentation via High-Quality Negative PromptsInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025

161

18 Jun 2025

Open-World Object Counting in Videos

Niki Amini-Naieni

Andrew Zisserman

178

18 Jun 2025

MCOO-SLAM: A Multi-Camera Omnidirectional Object SLAM System

152

18 Jun 2025

Prompting with the Future: Open-World Model Predictive Control with Interactive Digital Twins

235

16 Jun 2025

A Point Cloud Completion Approach for the Grasping of Partially Occluded Objects and Its Applications in Robotic Strawberry Harvesting

149

16 Jun 2025

A Comprehensive Survey on Video Scene Parsing:Advances, Challenges, and Prospects

Guohuan Xie

Syed Ariff Syed Hesham

177

16 Jun 2025

DeSPITE: Exploring Contrastive Deep Skeleton-Pointcloud-IMU-Text Embeddings for Advanced Point Cloud Human Activity Understanding

Thomas Kreutz

M. Mühlhäuser

Alejandro Sánchez Guinea

275

16 Jun 2025

Generative 4D Scene Gaussian Splatting with Object View-Synthesis Priors

218

15 Jun 2025

DAVID-XR1: Detecting AI-Generated Videos with Explainable Reasoning

...

348

13 Jun 2025

In-Hand Object Pose Estimation via Visual-Tactile Fusion

293

12 Jun 2025

Constrained Diffusion Models for Synthesizing Representative Power Flow Datasets

Milad Hoseinpour

Vladimir Dvorkin

DiffM MedIm

243

12 Jun 2025

GENMANIP: LLM-driven Simulation for Generalizable Instruction-Following ManipulationComputer Vision and Pattern Recognition (CVPR), 2025

350

12 Jun 2025

DreamActor-H1: High-Fidelity Human-Product Demonstration Video Generation via Motion-designed Diffusion Transformers

445

12 Jun 2025

Efficient Part-level 3D Object Generation via Dual Volume Packing

313

11 Jun 2025

HunyuanVideo-HOMA: Generic Human-Object Interaction in Multimodal Driven Human Animation

...

225

10 Jun 2025

iTACO: Interactable Digital Twins of Articulated Objects from Casually Captured RGBD Videos

322

10 Jun 2025

Segment Concealed Objects with Incomplete SupervisionIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025

...

237

10 Jun 2025