Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2408.00714
Cited By

SAM 2: Segment Anything in Images and Videos

SAM 2: Segment Anything in Images and Videos

International Conference on Learning Representations (ICLR), 2024

1 August 2024

Valentin Gabeur

Chaitanya K. Ryali

Roman Rädle

Laura Gustafson

Kalyan Vasudev Alwala

Ross B. Girshick

Piotr Dollár

Christoph Feichtenhofer

ArXiv (abs)PDF HTML HuggingFace (116 upvotes)

Papers citing "SAM 2: Segment Anything in Images and Videos"

50 / 861 papers shown

High Temporal Consistency through Semantic Similarity Propagation in Semi-Supervised Video Semantic Segmentation for Autonomous Flight

High Temporal Consistency through Semantic Similarity Propagation in Semi-Supervised Video Semantic Segmentation for Autonomous FlightComputer Vision and Pattern Recognition (CVPR), 2025

Cédric Vincent

256

2

0

19 Mar 2025

EgoDTM: Towards 3D-Aware Egocentric Video-Language Pretraining

EgoDTM: Towards 3D-Aware Egocentric Video-Language Pretraining

546

2

0

19 Mar 2025

Learning 4D Panoptic Scene Graph Generation from Rich 2D Visual Scene

Learning 4D Panoptic Scene Graph Generation from Rich 2D Visual SceneComputer Vision and Pattern Recognition (CVPR), 2025

307

4

0

19 Mar 2025

Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control

Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control

Hassan Abu Alhaija

Jose M. Alvarez

...

521

42

0

18 Mar 2025

AUTV: Creating Underwater Video Datasets with Pixel-wise Annotations

AUTV: Creating Underwater Video Datasets with Pixel-wise Annotations

Quang-Trung Truong

Duc Thanh Nguyen

318

1

0

17 Mar 2025

SAM2 for Image and Video Segmentation: A Comprehensive Survey

SAM2 for Image and Video Segmentation: A Comprehensive Survey

355

14

0

17 Mar 2025

DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models

DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models

491

15

0

17 Mar 2025

SED-MVS: Segmentation-Driven and Edge-Aligned Deformation Multi-View Stereo with Depth Restoration and Occlusion Constraint

SED-MVS: Segmentation-Driven and Edge-Aligned Deformation Multi-View Stereo with Depth Restoration and Occlusion Constraint

325

15

0

17 Mar 2025

VISO-Grasp: Vision-Language Informed Spatial Object-centric 6-DoF Active View Planning and Grasping in Clutter and Invisibility

VISO-Grasp: Vision-Language Informed Spatial Object-centric 6-DoF Active View Planning and Grasping in Clutter and Invisibility

Rainer Stiefelhagen

374

8

0

16 Mar 2025

SPC-GS: Gaussian Splatting with Semantic-Prompt Consistency for Indoor Open-World Free-view Synthesis from Sparse Inputs

SPC-GS: Gaussian Splatting with Semantic-Prompt Consistency for Indoor Open-World Free-view Synthesis from Sparse InputsComputer Vision and Pattern Recognition (CVPR), 2025

239

2

0

16 Mar 2025

GeoRSMLLM: A Multimodal Large Language Model for Vision-Language Tasks in Geoscience and Remote Sensing

GeoRSMLLM: A Multimodal Large Language Model for Vision-Language Tasks in Geoscience and Remote Sensing

212

4

0

16 Mar 2025

SPOC: Spatially-Progressing Object State Change Segmentation in Video

SPOC: Spatially-Progressing Object State Change Segmentation in Video

Priyanka Mandikal

Tushar Nagarajan

Kristen Grauman

258

1

0

15 Mar 2025

TACO: Taming Diffusion for in-the-wild Video Amodal Completion

TACO: Taming Diffusion for in-the-wild Video Amodal Completion

464

9

0

15 Mar 2025

ReBot: Scaling Robot Learning with Real-to-Sim-to-Real Robotic Video Synthesis

ReBot: Scaling Robot Learning with Real-to-Sim-to-Real Robotic Video Synthesis

Gedas Bertasius

268

18

0

15 Mar 2025

ROS-SAM: High-Quality Interactive Segmentation for Remote Sensing Moving Object

ROS-SAM: High-Quality Interactive Segmentation for Remote Sensing Moving ObjectComputer Vision and Pattern Recognition (CVPR), 2025

243

15

0

15 Mar 2025

PSF-4D: A Progressive Sampling Framework for View Consistent 4D Editing

PSF-4D: A Progressive Sampling Framework for View Consistent 4D Editing

DiffM 3DGS VGen

434

0

0

14 Mar 2025

EgoSplat: Open-Vocabulary Egocentric Scene Understanding with Language Embedded 3D Gaussian Splatting

849

1

0

14 Mar 2025

Human-in-the-Loop Local Corrections of 3D Scene Layouts via Infilling

Human-in-the-Loop Local Corrections of 3D Scene Layouts via Infilling

Christopher Xie

Henry Howard-Jenkins

Richard Newcombe

Vasileios Balntas

Jakob Julian Engel

411

1

0

14 Mar 2025

Large-scale Pre-training for Grounded Video Caption Generation

Large-scale Pre-training for Grounded Video Caption Generation

Evangelos Kazakos

Cordelia Schmid

452

3

0

13 Mar 2025

The Power of One: A Single Example is All it Takes for Segmentation in VLMs

Mir Rayat Imtiaz Hossain

Mennatullah Siam

James J. Little

578

2

0

13 Mar 2025

IMPACT: Intelligent Motion Planning with Acceptable Contact Trajectories via Vision-Language Models

IMPACT: Intelligent Motion Planning with Acceptable Contact Trajectories via Vision-Language Models

Oluwatobiloba Adesanya

346

5

0

13 Mar 2025

Towards Fast, Memory-based and Data-Efficient Vision-Language Policy

333

2

0

13 Mar 2025

Do computer vision foundation models learn the low-level characteristics of the human visual system?

Do computer vision foundation models learn the low-level characteristics of the human visual system?Computer Vision and Pattern Recognition (CVPR), 2025

458

7

0

13 Mar 2025

GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding

GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding

498

5

0

13 Mar 2025

LuciBot: Automated Robot Policy Learning from Generated Videos

Tsun-Hsuan Wang

318

2

0

12 Mar 2025

PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop

229

24

0

12 Mar 2025

2HandedAfforder: Learning Precise Actionable Bimanual Affordances from Human Videos

2HandedAfforder: Learning Precise Actionable Bimanual Affordances from Human Videos

Marvin Heidinger

Georgia Chalvatzaki

327

3

0

12 Mar 2025

V2M4: 4D Mesh Animation Reconstruction from a Single Monocular Video

V2M4: 4D Mesh Animation Reconstruction from a Single Monocular Video

304

15

0

11 Mar 2025

WildSeg3D: Segment Any 3D Objects in the Wild from 2D Images

WildSeg3D: Segment Any 3D Objects in the Wild from 2D Images

1.1K

6

0

11 Mar 2025

Referring to Any Person

Referring to Any Person

932

12

0

11 Mar 2025

FAM-HRI: Foundation-Model Assisted Multi-Modal Human-Robot Interaction Combining Gaze and Speech

FAM-HRI: Foundation-Model Assisted Multi-Modal Human-Robot Interaction Combining Gaze and Speech

Benjamin Kiefer

202

8

0

11 Mar 2025

MetaFold: Language-Guided Multi-Category Garment Folding Framework via Trajectory Generation and Foundation Model

MetaFold: Language-Guided Multi-Category Garment Folding Framework via Trajectory Generation and Foundation Model

...

268

4

0

11 Mar 2025

VRMDiff: Text-Guided Video Referring Matting Generation of Diffusion

318

1

0

11 Mar 2025

YOLOE: Real-Time Seeing Anything

YOLOE: Real-Time Seeing Anything

549

34

0

10 Mar 2025

RS2-SAM2: Customized SAM2 for Referring Remote Sensing Image Segmentation

RS2-SAM2: Customized SAM2 for Referring Remote Sensing Image Segmentation

523

1

0

10 Mar 2025

OmniSAM: Omnidirectional Segment Anything Model for UDA in Panoramic Semantic Segmentation

OmniSAM: Omnidirectional Segment Anything Model for UDA in Panoramic Semantic Segmentation

443

19

0

10 Mar 2025

MemorySAM: Memorize Modalities and Semantics with Segment Anything Model 2 for Multi-modal Semantic Segmentation

MemorySAM: Memorize Modalities and Semantics with Segment Anything Model 2 for Multi-modal Semantic Segmentation

467

11

0

09 Mar 2025

SAQ-SAM: Semantically-Aligned Quantization for Segment Anything Model

SAQ-SAM: Semantically-Aligned Quantization for Segment Anything Model

230

0

0

09 Mar 2025

Online Dense Point Tracking with Streaming Memory

Online Dense Point Tracking with Streaming Memory

331

1

0

09 Mar 2025

Improving SAM for Camouflaged Object Detection via Dual Stream Adapters

Improving SAM for Camouflaged Object Detection via Dual Stream Adapters

325

2

0

08 Mar 2025

Differentiable Rendering-based Pose Estimation for Surgical Robotic Instruments

Florian Richter

181

6

0

07 Mar 2025

Instrument-Splatting: Controllable Photorealistic Reconstruction of Surgical Instruments Using Gaussian Splatting

Instrument-Splatting: Controllable Photorealistic Reconstruction of Surgical Instruments Using Gaussian SplattingInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025

Septimiu E. Salcudean

267

4

0

06 Mar 2025

Image-Based Relocalization and Alignment for Long-Term Monitoring of Dynamic Underwater Environments

Image-Based Relocalization and Alignment for Long-Term Monitoring of Dynamic Underwater EnvironmentsIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2025

Michael Milford

Alejandro Fontan

366

1

0

06 Mar 2025

Shaken, Not Stirred: A Novel Dataset for Visual Understanding of Glasses in Human-Robot Bartending Tasks

Shaken, Not Stirred: A Novel Dataset for Visual Understanding of Glasses in Human-Robot Bartending Tasks

Lukás Gajdosech

Jan-Gerrit Habekost

Matthias Kerzel

357

0

0

06 Mar 2025

Surgical Gaussian Surfels: Highly Accurate Real-time Surgical Scene Rendering using Gaussian Surfels

Surgical Gaussian Surfels: Highly Accurate Real-time Surgical Scene Rendering using Gaussian Surfels

Idris O. Sunmola

Samuel Schmidgall

Paul Maria Scheikl

276

4

0

06 Mar 2025

Conformal In-Context Reverse Classification Accuracy: Efficient Estimation of Segmentation Quality with Statistical Guarantees

Conformal In-Context Reverse Classification Accuracy: Efficient Estimation of Segmentation Quality with Statistical Guarantees

Matias Cosarinsky

Gabriel Gimenez

Nicolás Gaggion

441

1

0

06 Mar 2025

WeakMedSAM: Weakly-Supervised Medical Image Segmentation via SAM with Sub-Class Exploration and Prompt Affinity MiningIEEE Transactions on Medical Imaging (IEEE TMI), 2025

390

10

0

06 Mar 2025

CREStE: Scalable Mapless Navigation with Internet Scale Priors and Counterfactual Guidance

CREStE: Scalable Mapless Navigation with Internet Scale Priors and Counterfactual Guidance

Harshit S. Sikchi

329

6

0

05 Mar 2025

SurgiSAM2: Fine-tuning a foundational model for surgical video anatomy segmentation and detection

Devanish N. Kamtam

Joseph B. Shrager

Satya Deepya Malla

Juan J. Cardona

Serena Yeung-Levy

215

3

0

05 Mar 2025

AirExo-2: Scaling up Generalizable Robotic Imitation Learning with Low-Cost Exoskeletons

AirExo-2: Scaling up Generalizable Robotic Imitation Learning with Low-Cost Exoskeletons

...

529

14

0

05 Mar 2025

1 2 3...13 14 15 16 17 18

Page 14 of 18

Pageof 18