Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2408.00714
Cited By

SAM 2: Segment Anything in Images and Videos

SAM 2: Segment Anything in Images and Videos

International Conference on Learning Representations (ICLR), 2024

1 August 2024

Valentin Gabeur

Chaitanya K. Ryali

Roman Rädle

Laura Gustafson

Kalyan Vasudev Alwala

Ross B. Girshick

Piotr Dollár

Christoph Feichtenhofer

ArXiv (abs)PDF HTML HuggingFace (116 upvotes)

Papers citing "SAM 2: Segment Anything in Images and Videos"

50 / 863 papers shown

Segment This Thing: Foveated Tokenization for Efficient Point-Prompted Segmentation

Segment This Thing: Foveated Tokenization for Efficient Point-Prompted SegmentationComputer Vision and Pattern Recognition (CVPR), 2025

Richard Newcombe

234

2

0

10 Jun 2025

HunyuanVideo-HOMA: Generic Human-Object Interaction in Multimodal Driven Human Animation

...

225

5

0

10 Jun 2025

PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement

PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement

Zhengguang Zhou

Jiangning Zhang

238

4

0

09 Jun 2025

Snap, Segment, Deploy: A Visual Data and Detection Pipeline for Wearable Industrial Assistants

Snap, Segment, Deploy: A Visual Data and Detection Pipeline for Wearable Industrial Assistants

Rainer Stiefelhagen

183

1

0

09 Jun 2025

Versatile Loco-Manipulation through Flexible Interlimb Coordination

Versatile Loco-Manipulation through Flexible Interlimb Coordination

Simon Le Cleac'h

300

6

0

09 Jun 2025

ARGUS: Hallucination and Omission Evaluation in Video-LLMs

ARGUS: Hallucination and Omission Evaluation in Video-LLMs

Reza Shirkavand

Heng-Chiao Huang

Gowthami Somepalli

288

3

0

09 Jun 2025

LogoSP: Local-global Grouping of Superpoints for Unsupervised Semantic Segmentation of 3D Point Clouds

LogoSP: Local-global Grouping of Superpoints for Unsupervised Semantic Segmentation of 3D Point CloudsComputer Vision and Pattern Recognition (CVPR), 2025

219

3

0

09 Jun 2025

HOI-PAGE: Zero-Shot Human-Object Interaction Generation with Part Affordance Guidance

HOI-PAGE: Zero-Shot Human-Object Interaction Generation with Part Affordance Guidance

181

0

0

08 Jun 2025

THU-Warwick Submission for EPIC-KITCHEN Challenge 2025: Semi-Supervised Video Object Segmentation

THU-Warwick Submission for EPIC-KITCHEN Challenge 2025: Semi-Supervised Video Object Segmentation

154

0

0

07 Jun 2025

EASG-Bench: Video Q&A Benchmark with Egocentric Action Scene Graphs

EASG-Bench: Video Q&A Benchmark with Egocentric Action Scene Graphs

Antonino Furnari

Subarna Tripathi

210

0

0

06 Jun 2025

MapleGrasp: Mask-guided Feature Pooling for Language-driven Efficient Robotic Grasping

MapleGrasp: Mask-guided Feature Pooling for Language-driven Efficient Robotic Grasping

Prashanth Krishnamurthy

Farshad Khorrami

291

0

0

06 Jun 2025

BiAssemble: Learning Collaborative Affordance for Bimanual Geometric Assembly

BiAssemble: Learning Collaborative Affordance for Bimanual Geometric Assembly

356

1

0

06 Jun 2025

3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World Model

3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World Model

403

14

0

06 Jun 2025

PyGemini: Unified Software Development towards Maritime Autonomy Systems

PyGemini: Unified Software Development towards Maritime Autonomy Systems

Kjetil Vasstein

Simon Lervåg Breivik

Trygve Maukon Myhr

Edmund Førland Brekke

180

4

0

06 Jun 2025

ChronoTailor: Harnessing Attention Guidance for Fine-Grained Video Virtual Try-On

ChronoTailor: Harnessing Attention Guidance for Fine-Grained Video Virtual Try-On

221

1

0

06 Jun 2025

Do It Yourself: Learning Semantic Correspondence from Pseudo-Labels

Do It Yourself: Learning Semantic Correspondence from Pseudo-Labels

Christian Theobalt

Christian Rupprecht

Adam Kortylewski

532

4

0

05 Jun 2025

Controlled Data Rebalancing in Multi-Task Learning for Real-World Image Super-Resolution

Controlled Data Rebalancing in Multi-Task Learning for Real-World Image Super-Resolution

154

1

0

05 Jun 2025

Track Any Anomalous Object: A Granular Video Anomaly Detection PipelineComputer Vision and Pattern Recognition (CVPR), 2025

...

251

2

0

05 Jun 2025

UAV4D: Dynamic Neural Rendering of Human-Centric UAV Imagery using Gaussian Splatting

Christopher Maxey

290

1

0

05 Jun 2025

Object-centric 3D Motion Field for Robot Learning from Human Videos

Object-centric 3D Motion Field for Robot Learning from Human Videos

269

5

0

04 Jun 2025

SplArt: Articulation Estimation and Part-Level Reconstruction with 3D Gaussian Splatting

SplArt: Articulation Estimation and Part-Level Reconstruction with 3D Gaussian Splatting

Muhammad Zubair Irshad

Vitor Campagnolo Guizilini

Rares Andrei Ambrus

G. Shakhnarovich

Matthew R. Walter

279

2

0

04 Jun 2025

HuGeDiff: 3D Human Generation via Diffusion with Gaussian Splatting

Maksym Ivashechkin

213

0

0

04 Jun 2025

Average Calibration Losses for Reliable Uncertainty in Medical Image Segmentation

Average Calibration Losses for Reliable Uncertainty in Medical Image Segmentation

Theodore Barfoot

Luis C. Garcia-Peraza-Herrera

Tom Vercauteren

489

0

0

04 Jun 2025

Object-level Self-Distillation for Vision Pretraining

Object-level Self-Distillation for Vision Pretraining

Çağlar Hızlı

Çağatay Yıldız

Pekka Marttinen

330

0

0

04 Jun 2025

Puck Localization Using Contextual Cues

181

0

0

04 Jun 2025

Grounded Vision-Language Interpreter for Integrated Task and Motion Planning

Grounded Vision-Language Interpreter for Integrated Task and Motion Planning

Jeremy Siburian

C. C. Beltran-Hernandez

Michael Görner

Atsushi Hashimoto

282

2

0

03 Jun 2025

SAMJ: Fast Image Annotation on ImageJ/Fiji via Segment Anything Model

SAMJ: Fast Image Annotation on ImageJ/Fiji via Segment Anything Model

Carlos Garcia-Lopez-de-Haro

Caterina Fuster-Barcelo

Curtis T. Rueden

...

Kevin W. Eliceiri

Jean-Christophe Olivo-Marin

Jean-Yves Tinevez

A. Muñoz-Barrutia

164

0

0

03 Jun 2025

From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit

From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit

Ekdeep Singh Lubana

Bahareh Tolooshams

323

10

0

03 Jun 2025

Technical Report for Ego4D Long-Term Action Anticipation Challenge 2025

Technical Report for Ego4D Long-Term Action Anticipation Challenge 2025

275

4

0

03 Jun 2025

Controllable Human-centric Keyframe Interpolation with Generative Prior

Controllable Human-centric Keyframe Interpolation with Generative Prior

Chen Change Loy

205

1

0

03 Jun 2025

Zero-Shot Tree Detection and Segmentation from Aerial Forest Imagery

Zero-Shot Tree Detection and Segmentation from Aerial Forest Imagery

Amritha Pallavoor

218

2

0

03 Jun 2025

Revisiting LRP: Positional Attribution as the Missing Ingredient for Transformer Explainability

Revisiting LRP: Positional Attribution as the Missing Ingredient for Transformer Explainability

Itamar Zimerman

221

3

0

02 Jun 2025

EarthMind: Leveraging Cross-Sensor Data for Advanced Earth Observation Interpretation with a Unified Multimodal LLM

EarthMind: Leveraging Cross-Sensor Data for Advanced Earth Observation Interpretation with a Unified Multimodal LLM

Danda Pani Paudel

Andrii Zadaianchuk

253

1

0

02 Jun 2025

Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control

Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control

272

15

0

02 Jun 2025

SAM-I2V: Upgrading SAM to Support Promptable Video Segmentation with Less than 0.2% Training Cost

SAM-I2V: Upgrading SAM to Support Promptable Video Segmentation with Less than 0.2% Training CostComputer Vision and Pattern Recognition (CVPR), 2025

Mike Zheng Shou

250

4

0

02 Jun 2025

No Train Yet Gain: Towards Generic Multi-Object Tracking in Sports and Beyond

No Train Yet Gain: Towards Generic Multi-Object Tracking in Sports and Beyond

Tomasz Stanczyk

François Brémond

259

3

0

02 Jun 2025

AuralSAM2: Enabling SAM2 Hear Through Pyramid Audio-Visual Feature Prompting

AuralSAM2: Enabling SAM2 Hear Through Pyramid Audio-Visual Feature Prompting

Gustavo Carneiro

318

0

0

01 Jun 2025

Depth-Aware Scoring and Hierarchical Alignment for Multiple Object Tracking

Depth-Aware Scoring and Hierarchical Alignment for Multiple Object TrackingInternational Conference on Information Photonics (ICIP), 2025

Charalambos Poullis

225

2

0

01 Jun 2025

iDPA: Instance Decoupled Prompt Attention for Incremental Medical Object Detection

iDPA: Instance Decoupled Prompt Attention for Incremental Medical Object Detection

149

0

0

31 May 2025

SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image Distillation

SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image Distillation

235

6

0

31 May 2025

ViVo: A Dataset for Volumetric Video Reconstruction and Compression

ViVo: A Dataset for Volumetric Video Reconstruction and Compression

Adrian Azzarelli

Ollie Moolan-Feroze

249

1

0

31 May 2025

Seg2Any: Open-set Segmentation-Mask-to-Image Generation with Precise Shape and Semantic Control

Seg2Any: Open-set Segmentation-Mask-to-Image Generation with Precise Shape and Semantic Control

354

2

0

31 May 2025

Leadership Assessment in Pediatric Intensive Care Unit Team Training

Leadership Assessment in Pediatric Intensive Care Unit Team Training

Liangyang Ouyang

Hisataka Nozawa

442

1

0

30 May 2025

Time Blindness: Why Video-Language Models Can't See What Humans Can?

Time Blindness: Why Video-Language Models Can't See What Humans Can?

Ujjwal Upadhyay

Mohamed Elhoseiny

220

3

0

30 May 2025

GenSpace: Benchmarking Spatially-Aware Image Generation

GenSpace: Benchmarking Spatially-Aware Image Generation

Hengshuang Zhao

281

2

0

30 May 2025

One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory

One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory

Mohammadreza Salehi

Norimasa Kobori

391

2

0

29 May 2025

PixelThink: Towards Efficient Chain-of-Pixel Reasoning

PixelThink: Towards Efficient Chain-of-Pixel Reasoning

342

13

0

29 May 2025

DINO-R1: Incentivizing Reasoning Capability in Vision Foundation Models

DINO-R1: Incentivizing Reasoning Capability in Vision Foundation Models

507

2

0

29 May 2025

Generating Fit Check Videos with a Handheld Camera

Generating Fit Check Videos with a Handheld Camera

Brian L. Curless

Ira Kemelmacher-Shlizerman

Steven M. Seitz

214

0

0

29 May 2025

CAST: Contrastive Adaptation and Distillation for Semi-Supervised Instance Segmentation

CAST: Contrastive Adaptation and Distillation for Semi-Supervised Instance Segmentation

516

0

0

28 May 2025

1 2 3...9 10 11...16 17 18

Page 10 of 18

Pageof 18