Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2408.00714
Cited By

SAM 2: Segment Anything in Images and Videos

SAM 2: Segment Anything in Images and Videos

International Conference on Learning Representations (ICLR), 2024

1 August 2024

Valentin Gabeur

Chaitanya K. Ryali

Roman Rädle

Laura Gustafson

Kalyan Vasudev Alwala

Ross B. Girshick

Piotr Dollár

Christoph Feichtenhofer

ArXiv (abs)PDF HTML HuggingFace (116 upvotes)

Papers citing "SAM 2: Segment Anything in Images and Videos"

50 / 859 papers shown

A multi-modal tactile fingertip design for robotic hands to enhance dexterous manipulation

A multi-modal tactile fingertip design for robotic hands to enhance dexterous manipulation

132

0

0

06 Oct 2025

SPEGNet: Synergistic Perception-Guided Network for Camouflaged Object Detection

SPEGNet: Synergistic Perception-Guided Network for Camouflaged Object Detection

Abdul Jabbar Siddiqui

143

0

0

06 Oct 2025

Bridge Thinking and Acting: Unleashing Physical Potential of VLM with Generalizable Action Expert

Bridge Thinking and Acting: Unleashing Physical Potential of VLM with Generalizable Action Expert

141

0

0

04 Oct 2025

SAMSOD: Rethinking SAM Optimization for RGB-T Salient Object Detection

SAMSOD: Rethinking SAM Optimization for RGB-T Salient Object Detection

122

1

0

04 Oct 2025

EmbodiSwap for Zero-Shot Robot Imitation Learning

EmbodiSwap for Zero-Shot Robot Imitation Learning

Eadom Dessalene

P. Mantripragada

Michael Maynord

Yiannis Aloimonos

112

1

0

04 Oct 2025

Geometry Meets Vision: Revisiting Pretrained Semantics in Distilled Fields

Geometry Meets Vision: Revisiting Pretrained Semantics in Distilled Fields

Anirudha Majumdar

137

1

0

03 Oct 2025

Med-K2N: Flexible K-to-N Modality Translation for Medical Image Synthesis

Med-K2N: Flexible K-to-N Modality Translation for Medical Image Synthesis

86

0

0

03 Oct 2025

Towards Scalable and Consistent 3D Editing

Towards Scalable and Consistent 3D Editing

144

2

0

03 Oct 2025

Dynamic Prompt Generation for Interactive 3D Medical Image Segmentation Training

Dynamic Prompt Generation for Interactive 3D Medical Image Segmentation Training

Tidiane Camaret N'dir

Alexander Pfefferle

Robin Tibor Schirrmeister

313

2

0

03 Oct 2025

Inferring Dynamic Physical Properties from Video Foundation Models

Inferring Dynamic Physical Properties from Video Foundation Models

Andrew Zisserman

156

2

0

02 Oct 2025

When Tracking Fails: Analyzing Failure Modes of SAM2 for Point-Based Tracking in Surgical Videos

When Tracking Fails: Analyzing Failure Modes of SAM2 for Point-Based Tracking in Surgical Videos

119

0

0

02 Oct 2025

Holistic Order Prediction in Natural Scenes

Holistic Order Prediction in Natural Scenes

Pierre Musacchio

259

0

0

02 Oct 2025

IMAGEdit: Let Any Subject Transform

IMAGEdit: Let Any Subject Transform

120

1

0

01 Oct 2025

Affordance-Guided Diffusion Prior for 3D Hand Reconstruction

Affordance-Guided Diffusion Prior for 3D Hand Reconstruction

Takehiko Ohkawa

161

1

0

01 Oct 2025

Instant4D: 4D Gaussian Splatting in Minutes

Instant4D: 4D Gaussian Splatting in Minutes

177

1

0

01 Oct 2025

Assessing Foundation Models for Mold Colony Detection with Limited Training Data

Assessing Foundation Models for Mold Colony Detection with Limited Training Data

Matthew Copping

87

0

0

01 Oct 2025

Robust Context-Aware Object Recognition

Robust Context-Aware Object Recognition

Klara Janouskova

Cristian Gavrus

195

0

0

01 Oct 2025

Domain-Specialized Interactive Segmentation Framework for Meningioma Radiotherapy Planning

Domain-Specialized Interactive Segmentation Framework for Meningioma Radiotherapy Planning

68

0

0

01 Oct 2025

Towards Unified Multimodal Misinformation Detection in Social Media: A Benchmark Dataset and Baseline

Towards Unified Multimodal Misinformation Detection in Social Media: A Benchmark Dataset and Baseline

195

2

0

30 Sep 2025

The 1st Solution for MOSEv1 Challenge on LSVOS 2025: CGFSeg

The 1st Solution for MOSEv1 Challenge on LSVOS 2025: CGFSeg

215

0

0

30 Sep 2025

A Systematic Study of Large Language Models for Task and Motion Planning With PDDLStream

A Systematic Study of Large Language Models for Task and Motion Planning With PDDLStream

Jorge Mendez-Mendez

110

1

0

30 Sep 2025

Cat: Post-Training Quantization Error Reduction via Cluster-based Affine Transformation

Cat: Post-Training Quantization Error Reduction via Cluster-based Affine Transformation

Masoud Daneshtalab

148

0

0

30 Sep 2025

NeoWorld: Neural Simulation of Explorable Virtual Worlds via Progressive 3D Unfolding

NeoWorld: Neural Simulation of Explorable Virtual Worlds via Progressive 3D Unfolding

131

0

0

29 Sep 2025

Triangle Splatting+: Differentiable Rendering with Opaque Triangles

Triangle Splatting+: Differentiable Rendering with Opaque Triangles

Renaud Vandeghen

Matheus Gadelha

Ming-Chyuan Lin

Marc Van Droogenbroeck

Andrea Tagliasacchi

118

2

0

29 Sep 2025

IA-VLA: Input Augmentation for Vision-Language-Action models in settings with semantically complex tasks

IA-VLA: Input Augmentation for Vision-Language-Action models in settings with semantically complex tasks

Tran Minh Son Le

96

1

0

29 Sep 2025

LayerD: Decomposing Raster Graphic Designs into Layers

LayerD: Decomposing Raster Graphic Designs into Layers

Tomoyuki Suzuki

158

3

0

29 Sep 2025

Efficient Domain-Adaptive Multi-Task Dense Prediction with Vision Foundation Models

Efficient Domain-Adaptive Multi-Task Dense Prediction with Vision Foundation Models

Niluthpol Chowdhury Mithun

Mikhail Sizintsev

S. Samarasekera

104

0

0

28 Sep 2025

Open-Vocabulary Spatio-Temporal Scene Graph for Robot Perception and Teleoperation Planning

Open-Vocabulary Spatio-Temporal Scene Graph for Robot Perception and Teleoperation Planning

179

0

0

27 Sep 2025

RAU: Reference-based Anatomical Understanding with Vision Language Models

RAU: Reference-based Anatomical Understanding with Vision Language Models

Ankush Mukherjee

148

2

0

26 Sep 2025

PartSAM: A Scalable Promptable Part Segmentation Model Trained on Native 3D Data

PartSAM: A Scalable Promptable Part Segmentation Model Trained on Native 3D Data

200

1

0

26 Sep 2025

CubistMerge: Spatial-Preserving Token Merging For Diverse ViT Backbones

CubistMerge: Spatial-Preserving Token Merging For Diverse ViT Backbones

155

0

0

26 Sep 2025

SingRef6D: Monocular Novel Object Pose Estimation with a Single RGB Reference

SingRef6D: Monocular Novel Object Pose Estimation with a Single RGB Reference

Abdullah Al Mamun

132

0

0

26 Sep 2025

MultiCrafter: High-Fidelity Multi-Subject Generation via Disentangled Attention and Identity-Aware Preference Alignment

MultiCrafter: High-Fidelity Multi-Subject Generation via Disentangled Attention and Identity-Aware Preference Alignment

Longxiang Zhang

206

1

0

26 Sep 2025

VLBiMan: Vision-Language Anchored One-Shot Demonstration Enables Generalizable Bimanual Robotic Manipulation

VLBiMan: Vision-Language Anchored One-Shot Demonstration Enables Generalizable Bimanual Robotic Manipulation

191

0

0

26 Sep 2025

Geo-R1: Improving Few-Shot Geospatial Referring Expression Understanding with Reinforcement Fine-Tuning

Geo-R1: Improving Few-Shot Geospatial Referring Expression Understanding with Reinforcement Fine-Tuning

242

3

0

26 Sep 2025

SAGE: Scene Graph-Aware Guidance and Execution for Long-Horizon Manipulation Tasks

SAGE: Scene Graph-Aware Guidance and Execution for Long-Horizon Manipulation Tasks

132

0

0

26 Sep 2025

LG-CD: Enhancing Language-Guided Change Detection through SAM2 Adaptation

LG-CD: Enhancing Language-Guided Change Detection through SAM2 Adaptation

160

0

0

26 Sep 2025

RefAM: Attention Magnets for Zero-Shot Referral Segmentation

RefAM: Attention Magnets for Zero-Shot Referral Segmentation

Muhammad Ferjad Naeem

641

0

0

26 Sep 2025

Drag4D: Align Your Motion with Text-Driven 3D Scene Generation

Drag4D: Align Your Motion with Text-Driven 3D Scene Generation

117

0

0

26 Sep 2025

NewtonGen: Physics-Consistent and Controllable Text-to-Video Generation via Neural Newtonian Dynamics

NewtonGen: Physics-Consistent and Controllable Text-to-Video Generation via Neural Newtonian Dynamics

Tharindu Wickremasinghe

Stanley H. Chan

DiffM VGen PINN

1.5K

10

0

25 Sep 2025

Dense Semantic Matching with VGGT Prior

Dense Semantic Matching with VGGT Prior

192

0

0

25 Sep 2025

Joint Flow Trajectory Optimization For Feasible Robot Motion Generation from Video Demonstrations

Joint Flow Trajectory Optimization For Feasible Robot Motion Generation from Video Demonstrations

Matthew Johnson-Roberson

85

0

0

25 Sep 2025

UniTransfer: Video Concept Transfer via Progressive Spatial and Timestep Decomposition

UniTransfer: Video Concept Transfer via Progressive Spatial and Timestep Decomposition

154

0

0

25 Sep 2025

Does FLUX Already Know How to Perform Physically Plausible Image Composition?

Does FLUX Already Know How to Perform Physically Plausible Image Composition?

311

11

0

25 Sep 2025

Neptune-X: Active X-to-Maritime Generation for Universal Maritime Object Detection

Neptune-X: Active X-to-Maritime Generation for Universal Maritime Object Detection

246

1

0

25 Sep 2025

Video models are zero-shot learners and reasoners

Video models are zero-shot learners and reasoners

Thaddäus Wiedemer

Shixiang Shane Gu

248

56

0

24 Sep 2025

Attack for Defense: Adversarial Agents for Point Prompt Optimization Empowering Segment Anything Model

Attack for Defense: Adversarial Agents for Point Prompt Optimization Empowering Segment Anything Model

108

1

0

23 Sep 2025

MV-UMI: A Scalable Multi-View Interface for Cross-Embodiment Learning

MV-UMI: A Scalable Multi-View Interface for Cross-Embodiment Learning

Fares Abu-Dakka

102

0

0

23 Sep 2025

Sa2VA-i: Improving Sa2VA Results with Consistent Training and Inference

Sa2VA-i: Improving Sa2VA Results with Consistent Training and Inference

Alexey Nekrasov

Alexander Hermans

172

0

0

23 Sep 2025

The 1st Solution for MOSEv2 Challenge 2025: Long-term and Concept-aware Video Segmentation via SeC

The 1st Solution for MOSEv2 Challenge 2025: Long-term and Concept-aware Video Segmentation via SeC

120

0

0

23 Sep 2025

1 2 3 4 5 6...16 17 18