ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.00714
  4. Cited By
SAM 2: Segment Anything in Images and Videos

SAM 2: Segment Anything in Images and Videos

International Conference on Learning Representations (ICLR), 2024
1 August 2024
Nikhila Ravi
Valentin Gabeur
Yuan-Ting Hu
Ronghang Hu
Chaitanya K. Ryali
Tengyu Ma
Haitham Khedr
Roman Rädle
Chloe Rolland
Laura Gustafson
Eric Mintun
Junting Pan
Kalyan Vasudev Alwala
Nicolas Carion
Chao-Yuan Wu
Ross B. Girshick
Piotr Dollár
Christoph Feichtenhofer
    VLMMLLM
ArXiv (abs)PDFHTMLHuggingFace (116 upvotes)

Papers citing "SAM 2: Segment Anything in Images and Videos"

50 / 863 papers shown
Segment This Thing: Foveated Tokenization for Efficient Point-Prompted Segmentation
Segment This Thing: Foveated Tokenization for Efficient Point-Prompted SegmentationComputer Vision and Pattern Recognition (CVPR), 2025
Tanner Schmidt
Richard Newcombe
VLM
234
2
0
10 Jun 2025
HunyuanVideo-HOMA: Generic Human-Object Interaction in Multimodal Driven Human Animation
Ziyao Huang
Zixiang Zhou
Juan Cao
Yifeng Ma
Yi Chen
...
Hongmei Wang
Qin Lin
Yuan Zhou
Qinglin Lu
Fan Tang
VGen
225
5
0
10 Jun 2025
PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement
PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement
Teng Hu
Zhentao Yu
Zhengguang Zhou
Jiangning Zhang
Yuan Zhou
Qinglin Lu
Ran Yi
VGen
238
4
0
09 Jun 2025
Snap, Segment, Deploy: A Visual Data and Detection Pipeline for Wearable Industrial Assistants
Snap, Segment, Deploy: A Visual Data and Detection Pipeline for Wearable Industrial Assistants
Di Wen
Junwei Zheng
R. Liu
Yi Xu
Kunyu Peng
Rainer Stiefelhagen
183
1
0
09 Jun 2025
Versatile Loco-Manipulation through Flexible Interlimb Coordination
Versatile Loco-Manipulation through Flexible Interlimb Coordination
Xinghao Zhu
Yuxin Chen
Lingfeng Sun
Farzad Niroui
Simon Le Cleac'h
Jiuguang Wang
Kuan Fang
300
6
0
09 Jun 2025
ARGUS: Hallucination and Omission Evaluation in Video-LLMs
ARGUS: Hallucination and Omission Evaluation in Video-LLMs
Ruchit Rawal
Reza Shirkavand
Heng-Chiao Huang
Gowthami Somepalli
Tom Goldstein
288
3
0
09 Jun 2025
LogoSP: Local-global Grouping of Superpoints for Unsupervised Semantic Segmentation of 3D Point Clouds
LogoSP: Local-global Grouping of Superpoints for Unsupervised Semantic Segmentation of 3D Point CloudsComputer Vision and Pattern Recognition (CVPR), 2025
Zihui Zhang
Weisheng Dai
Hongtao Wen
Bo Yang
3DPC
219
3
0
09 Jun 2025
HOI-PAGE: Zero-Shot Human-Object Interaction Generation with Part Affordance Guidance
HOI-PAGE: Zero-Shot Human-Object Interaction Generation with Part Affordance Guidance
Lei Li
Angela Dai
181
0
0
08 Jun 2025
THU-Warwick Submission for EPIC-KITCHEN Challenge 2025: Semi-Supervised Video Object Segmentation
THU-Warwick Submission for EPIC-KITCHEN Challenge 2025: Semi-Supervised Video Object Segmentation
Mingqi Gao
Haoran Duan
Tianlu Zhang
Jungong Han
154
0
0
07 Jun 2025
EASG-Bench: Video Q&A Benchmark with Egocentric Action Scene Graphs
EASG-Bench: Video Q&A Benchmark with Egocentric Action Scene Graphs
Ivan Rodin
Tz-Ying Wu
Kyle Min
S. N. Sridhar
Antonino Furnari
Subarna Tripathi
G. Farinella
210
0
0
06 Jun 2025
MapleGrasp: Mask-guided Feature Pooling for Language-driven Efficient Robotic Grasping
MapleGrasp: Mask-guided Feature Pooling for Language-driven Efficient Robotic Grasping
V. Bhat
Naman Patel
Prashanth Krishnamurthy
Ramesh Karri
Farshad Khorrami
291
0
0
06 Jun 2025
BiAssemble: Learning Collaborative Affordance for Bimanual Geometric Assembly
BiAssemble: Learning Collaborative Affordance for Bimanual Geometric Assembly
Yan Shen
Kai Cheng
Yubin Ke
Xinyuan Song
Zeyi Li
Xiaoqi Li
Hongwei Fan
Haoran Lu
Hao Dong
356
1
0
06 Jun 2025
3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World Model
3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World Model
Hongyan Zhi
Peihao Chen
Siyuan Zhou
Yubo Dong
Quanxi Wu
Lei Han
Mingkui Tan
403
14
0
06 Jun 2025
PyGemini: Unified Software Development towards Maritime Autonomy Systems
PyGemini: Unified Software Development towards Maritime Autonomy Systems
Kjetil Vasstein
Christian Le
Simon Lervåg Breivik
Trygve Maukon Myhr
Annette Stahl
Edmund Førland Brekke
180
4
0
06 Jun 2025
ChronoTailor: Harnessing Attention Guidance for Fine-Grained Video Virtual Try-On
ChronoTailor: Harnessing Attention Guidance for Fine-Grained Video Virtual Try-On
Jinjuan Wang
Wenzhang Sun
Ming Li
Y. Zheng
Fanyao Li
Zhulin Tao
Donglin Di
Hao Li
Wei Chen
Xianglin Huang
VGenAI4TS
221
1
0
06 Jun 2025
Do It Yourself: Learning Semantic Correspondence from Pseudo-Labels
Do It Yourself: Learning Semantic Correspondence from Pseudo-Labels
Olaf Dünkel
Thomas Wimmer
Christian Theobalt
Christian Rupprecht
Adam Kortylewski
3DPC
532
4
0
05 Jun 2025
Controlled Data Rebalancing in Multi-Task Learning for Real-World Image Super-Resolution
Controlled Data Rebalancing in Multi-Task Learning for Real-World Image Super-Resolution
Shuchen Lin
Mingtao Feng
Weisheng Dong
Fangfang Wu
Jianqiao Luo
Yaonan Wang
Guangming Shi
154
1
0
05 Jun 2025
Track Any Anomalous Object: A Granular Video Anomaly Detection PipelineComputer Vision and Pattern Recognition (CVPR), 2025
Yuzhi Huang
Chenxin Li
H. Zhang
Zixu Lin
Yunlong Lin
...
Xinyu Liu
Jiechao Gao
Yue Huang
Xinghao Ding
Yixuan Yuan
251
2
0
05 Jun 2025
UAV4D: Dynamic Neural Rendering of Human-Centric UAV Imagery using Gaussian Splatting
Jaehoon Choi
Dongki Jung
Christopher Maxey
Yonghan Lee
Sungmin Eum
Dinesh Manocha
Heesung Kwon
3DGS
290
1
0
05 Jun 2025
Object-centric 3D Motion Field for Robot Learning from Human Videos
Object-centric 3D Motion Field for Robot Learning from Human Videos
Zhao-Heng Yin
Sherry Yang
Pieter Abbeel
269
5
0
04 Jun 2025
SplArt: Articulation Estimation and Part-Level Reconstruction with 3D Gaussian Splatting
SplArt: Articulation Estimation and Part-Level Reconstruction with 3D Gaussian Splatting
Shengjie Lin
Jiading Fang
Muhammad Zubair Irshad
Vitor Campagnolo Guizilini
Rares Andrei Ambrus
G. Shakhnarovich
Matthew R. Walter
3DGS
279
2
0
04 Jun 2025
HuGeDiff: 3D Human Generation via Diffusion with Gaussian Splatting
Maksym Ivashechkin
Oscar Mendez
Richard Bowden
3DGS
213
0
0
04 Jun 2025
Average Calibration Losses for Reliable Uncertainty in Medical Image Segmentation
Average Calibration Losses for Reliable Uncertainty in Medical Image Segmentation
Theodore Barfoot
Luis C. Garcia-Peraza-Herrera
Samet Akcay
Ben Glocker
Tom Vercauteren
UQCV
489
0
0
04 Jun 2025
Object-level Self-Distillation for Vision Pretraining
Object-level Self-Distillation for Vision Pretraining
Çağlar Hızlı
Çağatay Yıldız
Pekka Marttinen
OCLVLM
330
0
0
04 Jun 2025
Puck Localization Using Contextual Cues
Liam Salass
Jerrin Bright
Amir Nazemi
Yuhao Chen
John S. Zelek
David A Clausi
181
0
0
04 Jun 2025
Grounded Vision-Language Interpreter for Integrated Task and Motion Planning
Grounded Vision-Language Interpreter for Integrated Task and Motion Planning
Jeremy Siburian
Keisuke Shirai
C. C. Beltran-Hernandez
Masashi Hamaya
Michael Görner
Atsushi Hashimoto
282
2
0
03 Jun 2025
SAMJ: Fast Image Annotation on ImageJ/Fiji via Segment Anything Model
SAMJ: Fast Image Annotation on ImageJ/Fiji via Segment Anything Model
Carlos Garcia-Lopez-de-Haro
Caterina Fuster-Barcelo
Curtis T. Rueden
Jonathan Heras
Vladimir Ulman
...
Kevin W. Eliceiri
Jean-Christophe Olivo-Marin
Jean-Yves Tinevez
Daniel Sage
A. Muñoz-Barrutia
VLM
164
0
0
03 Jun 2025
From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit
From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit
Valérie Costa
Thomas Fel
Ekdeep Singh Lubana
Bahareh Tolooshams
Demba Ba
323
10
0
03 Jun 2025
Technical Report for Ego4D Long-Term Action Anticipation Challenge 2025
Technical Report for Ego4D Long-Term Action Anticipation Challenge 2025
Qiaohui Chu
Haoyu Zhang
Yisen Feng
Meng Liu
Weili Guan
Yaowei Wang
Liqiang Nie
275
4
0
03 Jun 2025
Controllable Human-centric Keyframe Interpolation with Generative Prior
Controllable Human-centric Keyframe Interpolation with Generative Prior
Z. Guo
Size Wu
Zhongang Cai
Wei Li
Chen Change Loy
DiffMVGen
205
1
0
03 Jun 2025
Zero-Shot Tree Detection and Segmentation from Aerial Forest Imagery
Zero-Shot Tree Detection and Segmentation from Aerial Forest Imagery
Michelle Chen
David Russell
Amritha Pallavoor
Derek Young
Jane Wu
VLM
218
2
0
03 Jun 2025
Revisiting LRP: Positional Attribution as the Missing Ingredient for Transformer Explainability
Revisiting LRP: Positional Attribution as the Missing Ingredient for Transformer Explainability
Yarden Bakish
Itamar Zimerman
Hila Chefer
Lior Wolf
221
3
0
02 Jun 2025
EarthMind: Leveraging Cross-Sensor Data for Advanced Earth Observation Interpretation with a Unified Multimodal LLM
EarthMind: Leveraging Cross-Sensor Data for Advanced Earth Observation Interpretation with a Unified Multimodal LLM
Yan Shu
Bin Ren
Zhitong Xiong
Danda Pani Paudel
Luc Van Gool
Begüm Demir
Andrii Zadaianchuk
Paolo Rota
VLM
253
1
0
02 Jun 2025
Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control
Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control
Xiao Fu
Xintao Wang
Xian Liu
Jianhong Bai
R. Xu
Pengfei Wan
Di Zhang
Dahua Lin
VGen
272
15
0
02 Jun 2025
SAM-I2V: Upgrading SAM to Support Promptable Video Segmentation with Less than 0.2% Training Cost
SAM-I2V: Upgrading SAM to Support Promptable Video Segmentation with Less than 0.2% Training CostComputer Vision and Pattern Recognition (CVPR), 2025
Haiyang Mei
Pengyu Zhang
Mike Zheng Shou
VLM
250
4
0
02 Jun 2025
No Train Yet Gain: Towards Generic Multi-Object Tracking in Sports and Beyond
No Train Yet Gain: Towards Generic Multi-Object Tracking in Sports and Beyond
Tomasz Stanczyk
Seongro Yoon
François Brémond
259
3
0
02 Jun 2025
AuralSAM2: Enabling SAM2 Hear Through Pyramid Audio-Visual Feature Prompting
AuralSAM2: Enabling SAM2 Hear Through Pyramid Audio-Visual Feature Prompting
Yuyuan Liu
Yuanhong Chen
Chong Wang
Junlin Han
Junde Wu
Can Peng
Jingkun Chen
Yu Tian
Gustavo Carneiro
VLM
318
0
0
01 Jun 2025
Depth-Aware Scoring and Hierarchical Alignment for Multiple Object Tracking
Depth-Aware Scoring and Hierarchical Alignment for Multiple Object TrackingInternational Conference on Information Photonics (ICIP), 2025
Milad Khanchi
Maria Amer
Charalambos Poullis
VOT
225
2
0
01 Jun 2025
iDPA: Instance Decoupled Prompt Attention for Incremental Medical Object Detection
iDPA: Instance Decoupled Prompt Attention for Incremental Medical Object Detection
Huahui Yi
Wei Xu
Ziyuan Qin
Xi Chen
Xiaohu Wu
Kang Li
Qicheng Lao
VLM
149
0
0
31 May 2025
SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image Distillation
SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image Distillation
Xingtong Ge
Xin Zhang
Tongda Xu
Yi Zhang
Xinjie Zhang
Yan Wang
Jun Zhang
DiffM
235
6
0
31 May 2025
ViVo: A Dataset for Volumetric Video Reconstruction and Compression
ViVo: A Dataset for Volumetric Video Reconstruction and Compression
Adrian Azzarelli
Ge Gao
Ho Man Kwan
Fan Zhang
Qirui Yang
Ollie Moolan-Feroze
David Bull
3DH
249
1
0
31 May 2025
Seg2Any: Open-set Segmentation-Mask-to-Image Generation with Precise Shape and Semantic Control
Seg2Any: Open-set Segmentation-Mask-to-Image Generation with Precise Shape and Semantic Control
Danfeng li
Hui Zhang
Sheng Wang
Jiacheng Li
Zuxuan Wu
DiffMVLM
354
2
0
31 May 2025
Leadership Assessment in Pediatric Intensive Care Unit Team Training
Leadership Assessment in Pediatric Intensive Care Unit Team Training
Liangyang Ouyang
Yuki Sakai
Ryosuke Furuta
Hisataka Nozawa
Hikoro Matsui
Yoichi Sato
442
1
0
30 May 2025
Time Blindness: Why Video-Language Models Can't See What Humans Can?
Time Blindness: Why Video-Language Models Can't See What Humans Can?
Ujjwal Upadhyay
Mukul Ranjan
Zhiqiang Shen
Mohamed Elhoseiny
VLM
220
3
0
30 May 2025
GenSpace: Benchmarking Spatially-Aware Image Generation
GenSpace: Benchmarking Spatially-Aware Image Generation
Zehan Wang
Jiayang Xu
Ziang Zhang
Tianyu Pan
Chao Du
Hengshuang Zhao
Zhou Zhao
EGVM
281
2
0
30 May 2025
One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory
One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory
Chenhao Zheng
Jieyu Zhang
Mohammadreza Salehi
Ziqi Gao
Vishnu Iyengar
Norimasa Kobori
Quan Kong
Ranjay Krishna
391
2
0
29 May 2025
PixelThink: Towards Efficient Chain-of-Pixel Reasoning
PixelThink: Towards Efficient Chain-of-Pixel Reasoning
Song Wang
Gongfan Fang
Lingdong Kong
Xiangtai Li
Jianyun Xu
Maochun Luo
Qiang Li
Jianke Zhu
Xinchao Wang
LRM
342
13
0
29 May 2025
DINO-R1: Incentivizing Reasoning Capability in Vision Foundation Models
DINO-R1: Incentivizing Reasoning Capability in Vision Foundation Models
Chenbin Pan
Wenbin He
Zhengzhong Tu
Liu Ren
LRMVLM
507
2
0
29 May 2025
Generating Fit Check Videos with a Handheld Camera
Generating Fit Check Videos with a Handheld Camera
B. Chen
Brian L. Curless
Ira Kemelmacher-Shlizerman
Steven M. Seitz
DiffM
214
0
0
29 May 2025
CAST: Contrastive Adaptation and Distillation for Semi-Supervised Instance Segmentation
CAST: Contrastive Adaptation and Distillation for Semi-Supervised Instance Segmentation
Pardis Taghavi
Tian Liu
Renjie Li
Reza Langari
Zhengzhong Tu
ISeg
516
0
0
28 May 2025
Previous
123...91011...161718
Next
Page 10 of 18
Pageof 18