SAM 2: Segment Anything in Images and Videos

International Conference on Learning Representations (ICLR), 2024

1 August 2024

Roman Rädle

Kalyan Vasudev Alwala

Nicolas Carion

Chao-Yuan Wu

Ross B. Girshick

Piotr Dollár

Christoph Feichtenhofer

VLM

MLLM

ArXiv (abs)PDF HTML HuggingFace (116 upvotes)

Papers citing "SAM 2: Segment Anything in Images and Videos"

50 / 863 papers shown

Semantic Exploration and Dense Mapping of Complex Environments using Ground Robot with Panoramic LiDAR-Camera FusionIEEE Robotics and Automation Letters (IEEE RA-L), 2025

Srinivas Chowdary Ramineni

K. Shimada

241

28 May 2025

InfoSAM: Fine-Tuning the Segment Anything Model from An Information-Theoretic Perspective

237

28 May 2025

DexUMI: Using Human Hand as the Universal Manipulation Interface for Dexterous Manipulation

371

28 May 2025

SAM-R1: Leveraging SAM for Reward Feedback in Multimodal Segmentation via Reinforcement Learning

224

28 May 2025

Geometric Feature Prompting of Image Segmentation Models

131

27 May 2025

SANSA: Unleashing the Hidden Semantics in SAM2 for Few-Shot Segmentation

258

27 May 2025

PartInstruct: Part-level Instruction Following for Fine-grained Robot ManipulationRobotics (RAS), 2025

287

27 May 2025

OccLE: Label-Efficient 3D Semantic Occupancy Prediction

572

27 May 2025

Frame In-N-Out: Unbounded Controllable Image-to-Video Generation

387

27 May 2025

Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO

...

361

27 May 2025

AniCrafter: Customizing Realistic Human-Centric Animation via Avatar-Background Conditioning in Video Diffusion Models

327

26 May 2025

CCL-LGS: Contrastive Codebook Learning for 3D Language Gaussian Splatting

349

26 May 2025

CoT-RVS: Zero-Shot Chain-of-Thought Reasoning Segmentation for Videos

341

24 May 2025

Grounding Bodily Awareness in Visual Representations for Efficient Policy Learning

Junlin Wang

Zhiyun Lin

1.5K

24 May 2025

So-Fake: Benchmarking and Explaining Social Media Image Forgery Detection

...

711

24 May 2025

Instruct2See: Learning to Remove Any Obstructions Across Distributions

314

23 May 2025

Weakly-supervised Mamba-Based Mastoidectomy Shape Prediction for Cochlear Implant Surgery Using 3D T-Distribution Loss

Yike Zhang

Jack H. Noble

371

23 May 2025

Track Anything Annotate: Video annotation and dataset generation of computer vision models

161

23 May 2025

REN: Fast and Efficient Region Encodings from Patch-Based Image Encoders

403

23 May 2025

ComfyMind: Toward General-Purpose Generation via Tree-Based Planning and Reactive Feedback

224

23 May 2025

Towards Dynamic 3D Reconstruction of Hand-Instrument Interaction in Ophthalmic Surgery

308

23 May 2025

H2-COMPACT: Human-Humanoid Co-Manipulation via Adaptive Contact Trajectory Policies

Geeta Chandra Raju Bethala

297

23 May 2025

Auto-nnU-Net: Towards Automated Medical Image Segmentation

549

22 May 2025

gen2seg: Generative Models Enable Generalizable Instance Segmentation

Om Khangaonkar

Hamed Pirsiavash

DiffM VLM

456

21 May 2025

From Grounding to Manipulation: Case Studies of Foundation Model Integration in Embodied Robotic SystemsConference on Empirical Methods in Natural Language Processing (EMNLP), 2025

577

21 May 2025

Advancing Marine Research: UWSAM Framework and UIIS10K Dataset for Precise Underwater Instance Segmentation

Laurence Tianruo Yang

Weidong Zhang

Sam Kwong

VLM

454

21 May 2025

TAGS: 3D Tumor-Adaptive Guidance for SAM

466

21 May 2025

Scaling Vision Mamba Across Resolutions via Fractal Traversal

389

20 May 2025

Unlocking the Power of SAM 2 for Few-Shot Segmentation

282

20 May 2025

GraspMolmo: Generalizable Task-Oriented Grasping via Large-Scale Synthetic Data Generation

457

19 May 2025

Improving Compositional Generation with Diffusion Models Using Lift Scores

Chenning Yu

Sicun Gao

DiffM

1.2K

19 May 2025

3D Visual Illusion Depth Estimation

682

19 May 2025

VisionReasoner: Unified Reasoning-Integrated Visual Perception via Reinforcement Learning

491

17 May 2025

MTevent: A Multi-Task Event Camera Dataset for 6D Pose Estimation and Moving Object Detection

269

16 May 2025

Visual Planning: Let's Think Only with Images

457

16 May 2025

ManipBench: Benchmarking Vision-Language Models for Low-Level Robot Manipulation

371

14 May 2025

Air-Ground Collaboration for Language-Specified Missions in Unknown Environments

325

14 May 2025

Augmented Reality for RObots (ARRO): Pointing Visuomotor Policies Towards Visual Robustness

413

13 May 2025

ReSurgSAM2: Referring Segment Anything in Surgical Video via Credible Long-term TrackingInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025

262

13 May 2025

Extracting Visual Plans from Unlabeled Videos via Symbolic Guidance

329

13 May 2025

When Dance Video Archives Challenge Computer Vision

159

12 May 2025

ABS-Mamba: SAM2-Driven Bidirectional Spiral Mamba Network for Medical Image Translation

267

12 May 2025

The First WARA Robotics Mobile Manipulation Challenge -- Lessons LearnedEuropean Conference on Mobile Robots (ECMR), 2025

David Cáceres-Domínguez

...

235

11 May 2025

Morphologically Symmetric Reinforcement Learning for Ambidextrous Bimanual Manipulation

Zechu Li

Yufeng Jin

Daniel Felipe Ordoñez Apraez

Claudio Semini

Puze Liu

Georgia Chalvatzaki

995

08 May 2025

D-CODA: Diffusion for Coordinated Dual-Arm Data Augmentation

393

08 May 2025

UncertainSAM: Fast and Efficient Uncertainty Quantification of the Segment Anything Model

Timo Kaiser

Thomas Norrenbrock

Bodo Rosenhahn

632

08 May 2025

DeCLIP: Decoupled Learning for Open-Vocabulary Dense PerceptionComputer Vision and Pattern Recognition (CVPR), 2025

309

07 May 2025

RAVU: Retrieval Augmented Video Understanding with Compositional Reasoning over Graph

1.0K

06 May 2025

DyGEnc: Encoding a Sequence of Textual Scene Graphs to Reason and Answer Questions in Dynamic Scenes

325

06 May 2025

Show or Tell? A Benchmark To Evaluate Visual and Textual Prompts in Semantic Segmentation

Gabriele Rosi

Fabio Cermelli

VLM

472

06 May 2025