SAM 2: Segment Anything in Images and Videos

International Conference on Learning Representations (ICLR), 2024

1 August 2024

Roman Rädle

Kalyan Vasudev Alwala

Nicolas Carion

Chao-Yuan Wu

Ross B. Girshick

Piotr Dollár

Christoph Feichtenhofer

VLM

MLLM

ArXiv (abs)PDF HTML HuggingFace (116 upvotes)

Papers citing "SAM 2: Segment Anything in Images and Videos"

50 / 859 papers shown

ProTracker: Probabilistic Integration for Robust and Accurate Point Tracking

379

06 Jan 2025

SAM-EM: Real-Time Segmentation for Automated Liquid Phase Transmission Electron Microscopy

167

06 Jan 2025

Soft and Compliant Contact-Rich Hair Manipulation and CareIEEE/ACM International Conference on Human-Robot Interaction (HRI), 2025

300

05 Jan 2025

VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control

538

02 Jan 2025

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLMComputer Vision and Pattern Recognition (CVPR), 2024

...

425

31 Dec 2024

ILDiff: Generate Transparent Animated Stickers by Implicit Layout Distillation

325

31 Dec 2024

Gaussian Building Mesh (GBM): Extract a Building's 3D Mesh with Google Earth and Gaussian Splatting

361

31 Dec 2024

MAKIMA: Tuning-free Multi-Attribute Open-domain Video Editing via Mask-Guided Attention Modulation

...

267

31 Dec 2024

BODex: Scalable and Efficient Robotic Dexterous Grasp Synthesis Using Bilevel OptimizationIEEE International Conference on Robotics and Automation (ICRA), 2024

Jiayi Chen

Yubin Ke

Hongan Wang

427

21 Dec 2024

MultiverSeg: Scalable Interactive Segmentation of Biomedical Imaging Datasets with In-Context Guidance

Hallee E. Wong

Jose Javier Gonzalez Ortiz

John Guttag

Adrian V. Dalca

395

19 Dec 2024

M$^3$-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation

^3

-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object SegmentationComputer Vision and Pattern Recognition (CVPR), 2024

400

18 Dec 2024

Measurement of Medial Elbow Joint Space using Landmark DetectionIEEE Access (IEEE Access), 2024

554

17 Dec 2024

IGR: Improving Diffusion Model for Garment Restoration from Person Image

350

16 Dec 2024

InterDyn: Controllable Interactive Dynamics with Video Diffusion ModelsComputer Vision and Pattern Recognition (CVPR), 2024

Victoria Fernandez-Abrevaya

VGen AI4CE

630

16 Dec 2024

Rethinking Detecting Salient and Camouflaged Objects in Unconstrained Scenes

456

14 Dec 2024

Agtech Framework for Cranberry-Ripening Analysis Using Vision Foundation ModelsIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024

215

12 Dec 2024

Feat2GS: Probing Visual Foundation Models with Gaussian SplattingComputer Vision and Pattern Recognition (CVPR), 2024

294

12 Dec 2024

Pinco: Position-induced Consistent Adapter for Diffusion Transformer in Foreground-conditioned Inpainting

343

05 Dec 2024

EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios

416

05 Dec 2024

Referring Video Object Segmentation via Language-aligned Track Selection

426

02 Dec 2024

T-3DGS: Removing Transient Objects for 3D Scene Reconstruction

Vadim Pryadilshchikov

476

29 Nov 2024

Autonomous Imagination: Closed-Loop Decomposition of Visual-to-Textual Conversion in Visual Reasoning for Multimodal Large Language Models

513

27 Nov 2024

Omegance: A Single Parameter for Various Granularities in Diffusion-Based Synthesis

382

26 Nov 2024

VideoDirector: Precise Video Editing via Text-to-Video ModelsComputer Vision and Pattern Recognition (CVPR), 2024

501

26 Nov 2024

SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video SegmentationComputer Vision and Pattern Recognition (CVPR), 2024

461

26 Nov 2024

vesselFM: A Foundation Model for Universal 3D Blood Vessel SegmentationComputer Vision and Pattern Recognition (CVPR), 2024

409

26 Nov 2024

Leveraging Foundation Models To learn the shape of semi-fluid deformable objects

250

25 Nov 2024

Phase-Informed Tool Segmentation for Manual Small-Incision Cataract SurgeryInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2024

Niharika Singri Prasad

K. Murali

Mohit Jain

269

25 Nov 2024

Language Driven Occupancy Prediction

488

25 Nov 2024

VideoOrion: Tokenizing Object Dynamics in Videos

Sipeng Zheng

Zongqing Lu

406

25 Nov 2024

RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for RoboticsComputer Vision and Pattern Recognition (CVPR), 2024

841

25 Nov 2024

Generative Omnimatte: Learning to Decompose Video into LayersComputer Vision and Pattern Recognition (CVPR), 2024

462

25 Nov 2024

There is no SAMantics! Exploring SAM as a Backbone for Visual Understanding Tasks

317

22 Nov 2024

VIVID-10M: A Dataset and Baseline for Versatile and Interactive Video Local Editing

294

22 Nov 2024

Segment Anything in Light Fields for Real-Time Applications via Constrained Prompting

Nikolai Goncharov

Donald G. Dansereau

VLM

225

21 Nov 2024

Learning Generalizable 3D Manipulation With 10 Demonstrations

238

15 Nov 2024

Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel LevelComputer Vision and Pattern Recognition (CVPR), 2024

393

15 Nov 2024

CorrCLIP: Reconstructing Patch Correlations in CLIP for Open-Vocabulary Semantic Segmentation

665

15 Nov 2024

OneNet: A Channel-Wise 1D Convolutional U-Net

416

14 Nov 2024

Watermark Anything with Localized MessagesInternational Conference on Learning Representations (ICLR), 2024

456

11 Nov 2024

VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in VideosComputer Vision and Pattern Recognition (CVPR), 2024

459

07 Nov 2024

ET-SEED: Efficient Trajectory-Level SE(3) Equivariant Diffusion PolicyInternational Conference on Learning Representations (ICLR), 2024

343

06 Nov 2024

MultiDepth: Multi-Sample Priors for Refining Monocular Metric Depth Estimations in Indoor Scenes

132

01 Nov 2024

ZIM: Zero-Shot Image Matting for Anything

337

01 Nov 2024

TPC: Test-time Procrustes Calibration for Diffusion-based Human Image AnimationNeural Information Processing Systems (NeurIPS), 2024

373

31 Oct 2024

EchoFM: Foundation Model for Generalizable Echocardiogram AnalysisIEEE Transactions on Medical Imaging (IEEE TMI), 2024

Yiwei Li

265

30 Oct 2024

ReferEverything: Towards Segmenting Everything We Can Speak of in Videos

285

30 Oct 2024

Addressing Issues with Working Memory in Video Object Segmentation

105

29 Oct 2024

MovieCharacter: A Tuning-Free Framework for Controllable Character Video Synthesis

421

28 Oct 2024

Frontiers in Intelligent Colonoscopy

401

22 Oct 2024