SAM 2: Segment Anything in Images and Videos

International Conference on Learning Representations (ICLR), 2024

1 August 2024

Roman Rädle

Kalyan Vasudev Alwala

Nicolas Carion

Chao-Yuan Wu

Ross B. Girshick

Piotr Dollár

Christoph Feichtenhofer

VLM

MLLM

ArXiv (abs)PDF HTML HuggingFace (116 upvotes)

Papers citing "SAM 2: Segment Anything in Images and Videos"

50 / 861 papers shown

How Can Objects Help Video-Language Understanding?

356

10 Apr 2025

Are We Done with Object-Centric Learning?

Alexander Rubinstein

Christian Schroeder de Witt

Matthias Bethge

Seong Joon Oh

OCL

2.1K

09 Apr 2025

Few-Shot Adaptation of Grounding DINO for Agricultural Domain

309

09 Apr 2025

Falcon: Fractional Alternating Cut with Overcoming Minima in Unsupervised Segmentation

289

08 Apr 2025

HER-Seg: Holistically Efficient Segmentation for High-Resolution Medical Images

Tesema Fiseha Berhanu

284

08 Apr 2025

S^4M: Boosting Semi-Supervised Instance Segmentation with SAM

237

07 Apr 2025

Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting

...

263

07 Apr 2025

Lumina-OmniLV: A Unified Multimodal Framework for General Low-Level Vision

320

07 Apr 2025

CMaP-SAM: Contraction Mapping Prior for SAM-driven Few-shot Segmentation

270

07 Apr 2025

SAM2MOT: A Novel Paradigm of Multi-Object Tracking by Segmentation

718

06 Apr 2025

Multi-identity Human Image Animation with Structural Video Diffusion

263

05 Apr 2025

Performance Analysis of Deep Learning Models for Femur Segmentation in MRI ScanConference on Algebraic Informatics (AI), 2025

202

05 Apr 2025

Morpheus: Benchmarking Physical Reasoning of Video Generative Models with Real Physical Experiments

294

03 Apr 2025

MG-Gen: Single Image to Motion Graphics Generation

605

03 Apr 2025

ConMo: Controllable Motion Disentanglement and Recomposition for Zero-Shot Motion TransferComputer Vision and Pattern Recognition (CVPR), 2025

317

03 Apr 2025

BOP Challenge 2024 on Model-Based and Model-Free 6D Object Pose Estimation

...

691

03 Apr 2025

COST: Contrastive One-Stage Transformer for Vision-Language Small Object TrackingInformation Fusion (Inf. Fusion), 2025

290

02 Apr 2025

UnIRe: Unsupervised Instance Decomposition for Dynamic Urban Scene Reconstruction

1.0K

01 Apr 2025

WorldScore: A Unified Evaluation Benchmark for World Generation

401

01 Apr 2025

Coca-Splat: Collaborative Optimization for Camera Parameters and 3D Gaussians

333

01 Apr 2025

Zero-Shot 4D Lidar Panoptic SegmentationComputer Vision and Pattern Recognition (CVPR), 2025

350

01 Apr 2025

RipVIS: Rip Currents Video Instance Segmentation Benchmark for Beach Monitoring and SafetyComputer Vision and Pattern Recognition (CVPR), 2025

371

01 Apr 2025

PolygoNet: Leveraging Simplified Polygonal Representation for Effective Image Classification

Salim Khazem

Jérémy Fix

C´edric Pradalier

167

01 Apr 2025

ZeroMimic: Distilling Robotic Manipulation Skills from Web VideosIEEE International Conference on Robotics and Automation (ICRA), 2025

287

31 Mar 2025

Easi3R: Estimating Disentangled Motion from DUSt3R Without Training

398

31 Mar 2025

Multi-Task Learning for Extracting Menstrual Characteristics from Clinical Notes

298

31 Mar 2025

SALT: A Flexible Semi-Automatic Labeling Tool for General LiDAR Point Clouds with Cross-Scene Adaptability and 4D Consistency

374

31 Mar 2025

SAVeD: Learning to Denoise Low-SNR Video for Improved Downstream Performance

411

31 Mar 2025

ReferDINO-Plus: 2nd Solution for 4th PVUW MeViS Challenge at CVPR 2025

304

30 Mar 2025

EAP4EMSIG -- Enhancing Event-Driven Microscopy for Microfluidic Single-Cell Analysis

Nils Friederich

Angelo Jovin Yamachui Sitcheu

...

280

30 Mar 2025

A GAN-Enhanced Deep Learning Framework for Rooftop Detection from Historical Aerial ImageryInternational Journal of Remote Sensing (IJRS), 2025

365

29 Mar 2025

Segment Any Motion in VideosComputer Vision and Pattern Recognition (CVPR), 2025

318

28 Mar 2025

Segment then Splat: Unified 3D Open-Vocabulary Segmentation via Gaussian Splatting

266

28 Mar 2025

Semantic Consistent Language Gaussian Splatting for Point-Level Open-vocabulary Querying

278

27 Mar 2025

A Unified Image-Dense Annotation Generation Model for Underwater ScenesComputer Vision and Pattern Recognition (CVPR), 2025

331

27 Mar 2025

Online Reasoning Video Segmentation with Just-in-Time Digital Twins

Yiqing Shen

Bohan Liu

Chenjia Li

Lalithkumar Seenivasan

Mathias Unberath

VOS

420

27 Mar 2025

Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature FieldsComputer Vision and Pattern Recognition (CVPR), 2025

...

369

26 Mar 2025

DINeMo: Learning Neural Mesh Models with no 3D Annotations

371

26 Mar 2025

DynOPETs: A Versatile Benchmark for Dynamic Object Pose Estimation and Tracking in Moving Camera ScenariosIEEE Robotics and Automation Letters (IEEE RA-L), 2025

323

25 Mar 2025

Semi-SMD: Semi-Supervised Metric Depth Estimation via Surrounding Cameras for Autonomous Driving

417

25 Mar 2025

LayerCraft: Enhancing Text-to-Image Generation with CoT Reasoning and Layered Object Integration

555

25 Mar 2025

CamSAM2: Segment Anything Accurately in Camouflaged Videos

344

25 Mar 2025

RP-SAM2: Refining Point Prompts for Stable Surgical Instrument Segmentation

314

25 Mar 2025

RoboEngine: Plug-and-Play Robot Data Augmentation with Semantic Robot Segmentation and Background Generation

327

24 Mar 2025

OmnimatteZero: Fast Training-free Omnimatte with Pre-trained Video Diffusion Models

376

23 Mar 2025

SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining

...

661

23 Mar 2025

Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook

380

23 Mar 2025

SALT: Parameter-Efficient Fine-Tuning via Singular Value Adaptation with Low-Rank Transformation

337

20 Mar 2025

MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance

470

20 Mar 2025

M3: 3D-Spatial MultiModal MemoryInternational Conference on Learning Representations (ICLR), 2025

261

20 Mar 2025