SAM 2: Segment Anything in Images and Videos

International Conference on Learning Representations (ICLR), 2024

1 August 2024

Roman Rädle

Kalyan Vasudev Alwala

Nicolas Carion

Chao-Yuan Wu

Ross B. Girshick

Piotr Dollár

Christoph Feichtenhofer

VLM

MLLM

ArXiv (abs)PDF HTML HuggingFace (116 upvotes)

Papers citing "SAM 2: Segment Anything in Images and Videos"

50 / 863 papers shown

Visual Imitation Enables Contextual Humanoid Control

1.3K

06 May 2025

6D Pose Estimation on Spoons and Hands

Kevin Tan

Fan Yang

Yuxiao Chen

247

05 May 2025

Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities

...

1.2K

05 May 2025

Interleave-VLA: Enhancing Robot Manipulation with Interleaved Image-Text Instructions

...

379

04 May 2025

Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation

1.1K

04 May 2025

SignSplat: Rendering Sign Language via Gaussian Splatting

368

04 May 2025

Prompt-responsive Object Retrieval with Memory-augmented Student-Teacher LearningIEEE International Conference on Robotics and Automation (ICRA), 2025

Malte Mosbach

Sven Behnke

197

04 May 2025

Segment Any RGB-Thermal Model with Language-aided Distillation

475

04 May 2025

Accelerating Volumetric Medical Image Annotation via Short-Long Memory SAM 2IEEE Transactions on Medical Imaging (IEEE TMI), 2025

485

03 May 2025

RESAnything: Attribute Prompting for Arbitrary Referring Segmentation

Ruiqi Wang

Hao Zhang

VLM

282

03 May 2025

Can Foundation Models Really Segment Tumors? A Benchmarking Odyssey in Lung CT Imaging

Elena Mulero Ayllón

Massimiliano Mantegna

261

02 May 2025

Improving Editability in Image Generation with Layer-wise MemoryComputer Vision and Pattern Recognition (CVPR), 2025

299

02 May 2025

Zoomer: Adaptive Image Focus Optimization for Black-box MLLM

...

390

30 Apr 2025

UniBiomed: A Universal Foundation Model for Grounded Biomedical Image Interpretation

...

Ronald Cheong Kin Chan

Yifan Peng

Pranav Rajpurkar

Hao Chen

LM&MA MedIm

665

30 Apr 2025

PRISM-DP: Spatial Pose-based Observations for Diffusion-Policies via Segmentation, Mesh Generation, and Pose Tracking

397

29 Apr 2025

Dexonomy: Synthesizing All Dexterous Grasp Types in a Grasp Taxonomy

307

26 Apr 2025

SORT3D: Spatial Object-centric Reasoning Toolbox for Zero-Shot 3D Grounding Using Large Language Models

797

25 Apr 2025

Step1X-Edit: A Practical Framework for General Image Editing

...

762

174

24 Apr 2025

PIN-WM: Learning Physics-INformed World Models for Non-Prehensile Manipulation

419

23 Apr 2025

Physically Consistent Humanoid Loco-Manipulation using Latent Diffusion Models

271

23 Apr 2025

AffordanceSAM: Segment Anything Once More in Affordance Grounding

307

22 Apr 2025

Model-based Metric 3D Shape and Motion Reconstruction of Wild Bottlenose Dolphins in Drone-Shot Videos

405

22 Apr 2025

LSP-ST: Ladder Shape-Biased Side-Tuning for Robust Infrared Small Target Detection

304

20 Apr 2025

Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D

Krishna Murthy Jatavallabhula

...

286

19 Apr 2025

HSACNet: Hierarchical Scale-Aware Consistency Regularized Semi-Supervised Change Detection

182

18 Apr 2025

Crossing the Human-Robot Embodiment Gap with Sim-to-Real RL using One Human Demonstration

410

17 Apr 2025

Perception Encoder: The best visual embeddings are not at the output of the network

Daniel Bolya

Po-Yao (Bernie) Huang

...

Christoph Feichtenhofer

ObjD VOS

675

118

17 Apr 2025

A0: An Affordance-Aware Hierarchical Model for General Robotic Manipulation

...

631

17 Apr 2025

Zooming In on Fakes: A Novel Dataset for Localized AI-Generated Image Detection with Forgery Amplification Approach

397

16 Apr 2025

AnomalyR1: A GRPO-based End-to-end MLLM for Industrial Anomaly Detection

363

16 Apr 2025

How Do I Do That? Synthesizing 3D Hand Motion and Contacts for Everyday InteractionsComputer Vision and Pattern Recognition (CVPR), 2025

352

16 Apr 2025

ZeroGrasp: Zero-Shot Shape Reconstruction Enabled Robotic GraspingComputer Vision and Pattern Recognition (CVPR), 2025

Shun Iwase

Zubair Irshad

Katherine Liu

Vitor Campagnolo Guizilini

...

347

15 Apr 2025

PVUW 2025 Challenge Report: Advances in Pixel-level Understanding of Complex Videos in the Wild

...

344

15 Apr 2025

OmniVDiff: Omni Controllable Video Diffusion for Generation and Understanding

605

15 Apr 2025

CAP-Net: A Unified Network for 6D Pose and Size Estimation of Categorical Articulated Parts from a Single RGB-D ImageComputer Vision and Pattern Recognition (CVPR), 2025

446

15 Apr 2025

Aligning Anime Video Generation with Human Feedback

392

14 Apr 2025

MASSeg : 2nd Technical Report for 4th PVUW MOSE Track

231

14 Apr 2025

Enhanced Semantic Extraction and Guidance for UGC Image Super Resolution

368

14 Apr 2025

FVOS for MOSE Track of 4th PVUW Challenge: 3rd Place Solution

154

13 Apr 2025

ToolTipNet: A Segmentation-Driven Deep Learning Baseline for Surgical Instrument Tip Detection

Zijian Wu

Shuojue Yang

Yueming Jin

Septimiu E. Salcudean

MedIm

349

13 Apr 2025

PathSeqSAM: Sequential Modeling for Pathology Image Segmentation with SAM2

121

12 Apr 2025

FMLGS: Fast Multilevel Language Embedded Gaussians for Part-level Interactive Agents

221

11 Apr 2025

DSM: Constructing a Diverse Semantic Map for 3D Visual Grounding

317

11 Apr 2025

Parameter-Free Fine-tuning via Redundancy Elimination for Vision Foundation Models

219

11 Apr 2025

Palmprint De-Identification Using Diffusion Model for High-Quality and Diverse Synthesis

438

11 Apr 2025

RealCam-Vid: High-resolution Video Dataset with Dynamic Scenes and Metric-scale Camera Movements

245

11 Apr 2025

CoProSketch: Controllable and Progressive Sketch Generation with Diffusion Model

311

11 Apr 2025

DreamFuse: Adaptive Image Fusion with Diffusion Transformer

221

11 Apr 2025

ZS-VCOS: Zero-Shot Video Camouflaged Object Segmentation By Optical Flow and Open Vocabulary Object Detection

441

10 Apr 2025

VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning

365

10 Apr 2025