ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.00714
  4. Cited By
SAM 2: Segment Anything in Images and Videos

SAM 2: Segment Anything in Images and Videos

International Conference on Learning Representations (ICLR), 2024
1 August 2024
Nikhila Ravi
Valentin Gabeur
Yuan-Ting Hu
Ronghang Hu
Chaitanya K. Ryali
Tengyu Ma
Haitham Khedr
Roman Rädle
Chloe Rolland
Laura Gustafson
Eric Mintun
Junting Pan
Kalyan Vasudev Alwala
Nicolas Carion
Chao-Yuan Wu
Ross B. Girshick
Piotr Dollár
Christoph Feichtenhofer
    VLMMLLM
ArXiv (abs)PDFHTMLHuggingFace (116 upvotes)

Papers citing "SAM 2: Segment Anything in Images and Videos"

50 / 859 papers shown
A multi-modal tactile fingertip design for robotic hands to enhance dexterous manipulation
A multi-modal tactile fingertip design for robotic hands to enhance dexterous manipulation
Zhuowei Xu
Zilin Si
Kevin Zhang
Oliver Kroemer
Zeynep Temel
132
0
0
06 Oct 2025
SPEGNet: Synergistic Perception-Guided Network for Camouflaged Object Detection
SPEGNet: Synergistic Perception-Guided Network for Camouflaged Object Detection
Baber Jan
Saeed Anwar
Aiman El-Maleh
Abdul Jabbar Siddiqui
Abdul Bais
143
0
0
06 Oct 2025
Bridge Thinking and Acting: Unleashing Physical Potential of VLM with Generalizable Action Expert
Bridge Thinking and Acting: Unleashing Physical Potential of VLM with Generalizable Action Expert
Mingyu Liu
Zheng Huang
Xiaoyi Lin
Huanyi Zheng
Canyu Zhao
Zongze Du
Y. Wang
Haoyi Zhu
Hao Chen
Chunhua Shen
141
0
0
04 Oct 2025
SAMSOD: Rethinking SAM Optimization for RGB-T Salient Object Detection
SAMSOD: Rethinking SAM Optimization for RGB-T Salient Object Detection
Zhengyi Liu
Xinrui Wang
Xianyong Fang
Zhengzheng Tu
Linbo Wang
122
1
0
04 Oct 2025
EmbodiSwap for Zero-Shot Robot Imitation Learning
EmbodiSwap for Zero-Shot Robot Imitation Learning
Eadom Dessalene
P. Mantripragada
Michael Maynord
Yiannis Aloimonos
LM&Ro
112
1
0
04 Oct 2025
Geometry Meets Vision: Revisiting Pretrained Semantics in Distilled Fields
Geometry Meets Vision: Revisiting Pretrained Semantics in Distilled Fields
Zhiting Mei
Ola Shorinwa
Anirudha Majumdar
137
1
0
03 Oct 2025
Med-K2N: Flexible K-to-N Modality Translation for Medical Image Synthesis
Med-K2N: Flexible K-to-N Modality Translation for Medical Image Synthesis
Feng Yuan
Yifan Gao
Yuehua Ye
Haoyue Li
Xin Gao
MedIm
86
0
0
03 Oct 2025
Towards Scalable and Consistent 3D Editing
Towards Scalable and Consistent 3D Editing
Ruihao Xia
Yang Tang
Pan Zhou
DiffM
144
2
0
03 Oct 2025
Dynamic Prompt Generation for Interactive 3D Medical Image Segmentation Training
Dynamic Prompt Generation for Interactive 3D Medical Image Segmentation Training
Tidiane Camaret N'dir
Alexander Pfefferle
Robin Tibor Schirrmeister
MedIm3DH
313
2
0
03 Oct 2025
Inferring Dynamic Physical Properties from Video Foundation Models
Inferring Dynamic Physical Properties from Video Foundation Models
Guanqi Zhan
Xianzheng Ma
Weidi Xie
Andrew Zisserman
VGen
156
2
0
02 Oct 2025
When Tracking Fails: Analyzing Failure Modes of SAM2 for Point-Based Tracking in Surgical Videos
When Tracking Fails: Analyzing Failure Modes of SAM2 for Point-Based Tracking in Surgical Videos
Woowon Jang
Jiwon Im
Juseung Choi
Niki Rashidian
W. D. Neve
Utku Ozbulak
119
0
0
02 Oct 2025
Holistic Order Prediction in Natural Scenes
Holistic Order Prediction in Natural Scenes
Pierre Musacchio
Hyunmin Lee
Jaesik Park
3DV
259
0
0
02 Oct 2025
IMAGEdit: Let Any Subject Transform
IMAGEdit: Let Any Subject Transform
Fei Shen
Weihao Xu
Rui Yan
Dong Zhang
Xiangbo Shu
Jinhui Tang
VGen
120
1
0
01 Oct 2025
Affordance-Guided Diffusion Prior for 3D Hand Reconstruction
Affordance-Guided Diffusion Prior for 3D Hand Reconstruction
Naru Suzuki
Takehiko Ohkawa
Tatsuro Banno
Jihyun Lee
Ryosuke Furuta
Yoichi Sato
DiffM
161
1
0
01 Oct 2025
Instant4D: 4D Gaussian Splatting in Minutes
Instant4D: 4D Gaussian Splatting in Minutes
Zhanpeng Luo
Haoxi Ran
Li Lu
3DGSVGen
177
1
0
01 Oct 2025
Assessing Foundation Models for Mold Colony Detection with Limited Training Data
Assessing Foundation Models for Mold Colony Detection with Limited Training Data
Henrik Pichler
Janis Keuper
Matthew Copping
87
0
0
01 Oct 2025
Robust Context-Aware Object Recognition
Robust Context-Aware Object Recognition
Klara Janouskova
Cristian Gavrus
Jirí Matas
195
0
0
01 Oct 2025
Domain-Specialized Interactive Segmentation Framework for Meningioma Radiotherapy Planning
Domain-Specialized Interactive Segmentation Framework for Meningioma Radiotherapy Planning
J. Lee
Han Jang
Kyu Sung Choi
68
0
0
01 Oct 2025
Towards Unified Multimodal Misinformation Detection in Social Media: A Benchmark Dataset and Baseline
Towards Unified Multimodal Misinformation Detection in Social Media: A Benchmark Dataset and Baseline
Haiyang Li
Yaxiong Wang
Lianwei Wu
Lechao Cheng
Lechao Cheng
Zhun Zhong
195
2
0
30 Sep 2025
The 1st Solution for MOSEv1 Challenge on LSVOS 2025: CGFSeg
The 1st Solution for MOSEv1 Challenge on LSVOS 2025: CGFSeg
Tingmin Li
Yixuan Li
Yang Yang
VOS
215
0
0
30 Sep 2025
A Systematic Study of Large Language Models for Task and Motion Planning With PDDLStream
A Systematic Study of Large Language Models for Task and Motion Planning With PDDLStream
Jorge Mendez-Mendez
LRM
110
1
0
30 Sep 2025
Cat: Post-Training Quantization Error Reduction via Cluster-based Affine Transformation
Cat: Post-Training Quantization Error Reduction via Cluster-based Affine Transformation
Ali Zoljodi
Radu Timofte
Masoud Daneshtalab
MQ
148
0
0
30 Sep 2025
NeoWorld: Neural Simulation of Explorable Virtual Worlds via Progressive 3D Unfolding
NeoWorld: Neural Simulation of Explorable Virtual Worlds via Progressive 3D Unfolding
Yanpeng Zhao
Shanyan Guan
Y Samuel Wang
Yanhao Ge
Wei-Jang Li
Xiaokang Yang
VGen
131
0
0
29 Sep 2025
Triangle Splatting+: Differentiable Rendering with Opaque Triangles
Triangle Splatting+: Differentiable Rendering with Opaque Triangles
Jan Held
Renaud Vandeghen
Sanghyun Son
Daniel Rebain
Matheus Gadelha
Yi Zhou
Ming-Chyuan Lin
Marc Van Droogenbroeck
Andrea Tagliasacchi
3DGS
118
2
0
29 Sep 2025
IA-VLA: Input Augmentation for Vision-Language-Action models in settings with semantically complex tasks
IA-VLA: Input Augmentation for Vision-Language-Action models in settings with semantically complex tasks
Eric Hannus
Miika Malin
Tran Minh Son Le
Ville Kyrki
VLM
96
1
0
29 Sep 2025
LayerD: Decomposing Raster Graphic Designs into Layers
LayerD: Decomposing Raster Graphic Designs into Layers
Tomoyuki Suzuki
Kang-Jun Liu
Naoto Inoue
Kota Yamaguchi
158
3
0
29 Sep 2025
Efficient Domain-Adaptive Multi-Task Dense Prediction with Vision Foundation Models
Efficient Domain-Adaptive Multi-Task Dense Prediction with Vision Foundation Models
Beomseok Kang
Niluthpol Chowdhury Mithun
Mikhail Sizintsev
Han-Pang Chiu
S. Samarasekera
104
0
0
28 Sep 2025
Open-Vocabulary Spatio-Temporal Scene Graph for Robot Perception and Teleoperation Planning
Open-Vocabulary Spatio-Temporal Scene Graph for Robot Perception and Teleoperation Planning
Y Samuel Wang
Zeyu Xue
Mujie Liu
Tongqin Zhang
Yan Hu
Zhou Zhao
Chenguang Yang
Zhenyu Lu
179
0
0
27 Sep 2025
RAU: Reference-based Anatomical Understanding with Vision Language Models
RAU: Reference-based Anatomical Understanding with Vision Language Models
Yiwei Li
Y. Liu
Jiaqi Guo
Lin Zhao
Zheyuan Zhang
Xiao Chen
Boris Mailhe
Ankush Mukherjee
Terrence Chen
Shanhui Sun
148
2
0
26 Sep 2025
PartSAM: A Scalable Promptable Part Segmentation Model Trained on Native 3D Data
PartSAM: A Scalable Promptable Part Segmentation Model Trained on Native 3D Data
Zhe Zhu
Le Wan
Rui-Xue Xu
Y. Zhang
Honghua Chen
Zhiyang Dou
Cheng Lin
Yuan Liu
Mingqiang Wei
VLM
200
1
0
26 Sep 2025
CubistMerge: Spatial-Preserving Token Merging For Diverse ViT Backbones
CubistMerge: Spatial-Preserving Token Merging For Diverse ViT Backbones
Wenyi Gong
Mieszko Lis
155
0
0
26 Sep 2025
SingRef6D: Monocular Novel Object Pose Estimation with a Single RGB Reference
SingRef6D: Monocular Novel Object Pose Estimation with a Single RGB Reference
Jiahui Wang
H. Zhu
Haoren Guo
Abdullah Al Mamun
Cheng Xiang
T. Lee
132
0
0
26 Sep 2025
MultiCrafter: High-Fidelity Multi-Subject Generation via Disentangled Attention and Identity-Aware Preference Alignment
MultiCrafter: High-Fidelity Multi-Subject Generation via Disentangled Attention and Identity-Aware Preference Alignment
Tao Wu
Yibo Jiang
Yehao Lu
Zhizhong Wang
Longxiang Zhang
Zequn Qin
Xi Li
206
1
0
26 Sep 2025
VLBiMan: Vision-Language Anchored One-Shot Demonstration Enables Generalizable Bimanual Robotic Manipulation
VLBiMan: Vision-Language Anchored One-Shot Demonstration Enables Generalizable Bimanual Robotic Manipulation
Huayi Zhou
Kui Jia
LM&Ro
191
0
0
26 Sep 2025
Geo-R1: Improving Few-Shot Geospatial Referring Expression Understanding with Reinforcement Fine-Tuning
Geo-R1: Improving Few-Shot Geospatial Referring Expression Understanding with Reinforcement Fine-Tuning
Zilun Zhang
Zian Guan
T. Zhao
H. Shen
Tianyu Li
Yuxiang Cai
Zhonggen Su
Zhaojun Liu
Jianwei Yin
Xiang Li
ObjDLRM
242
3
0
26 Sep 2025
SAGE: Scene Graph-Aware Guidance and Execution for Long-Horizon Manipulation Tasks
SAGE: Scene Graph-Aware Guidance and Execution for Long-Horizon Manipulation Tasks
Jialiang Li
Wenzheng Wu
Gaojing Zhang
Yifan Han
Wenzhao Lian
LM&Ro
132
0
0
26 Sep 2025
LG-CD: Enhancing Language-Guided Change Detection through SAM2 Adaptation
LG-CD: Enhancing Language-Guided Change Detection through SAM2 Adaptation
Yixiao Liu
Yizhou Yang
Jinwen Li
Jun Tao
R. Li
Xiangkun Wang
Min Zhu
Junlong Cheng
160
0
0
26 Sep 2025
RefAM: Attention Magnets for Zero-Shot Referral Segmentation
RefAM: Attention Magnets for Zero-Shot Referral Segmentation
Anna Kukleva
Enis Simsar
A. Tonioni
Muhammad Ferjad Naeem
F. Tombari
J. E. Lenssen
Bernt Schiele
DiffMVLM
641
0
0
26 Sep 2025
Drag4D: Align Your Motion with Text-Driven 3D Scene Generation
Drag4D: Align Your Motion with Text-Driven 3D Scene Generation
Minjun Kang
Inkyu Shin
Taeyeop Lee
In So Kweon
KuK-Jin Yoon
117
0
0
26 Sep 2025
NewtonGen: Physics-Consistent and Controllable Text-to-Video Generation via Neural Newtonian Dynamics
NewtonGen: Physics-Consistent and Controllable Text-to-Video Generation via Neural Newtonian Dynamics
Yu Yuan
Xijun Wang
Tharindu Wickremasinghe
Zeeshan Nadir
Bole Ma
Stanley H. Chan
DiffMVGenPINN
1.5K
10
0
25 Sep 2025
Dense Semantic Matching with VGGT Prior
Dense Semantic Matching with VGGT Prior
Songlin Yang
Tianyi Wei
Yushi Lan
Zeqi Xiao
Anyi Rao
Xingang Pan
3DV
192
0
0
25 Sep 2025
Joint Flow Trajectory Optimization For Feasible Robot Motion Generation from Video Demonstrations
Joint Flow Trajectory Optimization For Feasible Robot Motion Generation from Video Demonstrations
Xiaoxiang Dong
Matthew Johnson-Roberson
Weiming Zhi
85
0
0
25 Sep 2025
UniTransfer: Video Concept Transfer via Progressive Spatial and Timestep Decomposition
UniTransfer: Video Concept Transfer via Progressive Spatial and Timestep Decomposition
Guojun Lei
Rong Zhang
Chi-Yin Wang
Tianhang Liu
Hong Li
Zhiyuan Ma
W. Xu
VGen
154
0
0
25 Sep 2025
Does FLUX Already Know How to Perform Physically Plausible Image Composition?
Does FLUX Already Know How to Perform Physically Plausible Image Composition?
Shilin Lu
Zhuming Lian
Zihan Zhou
Shaocong Zhang
Chen Zhao
A. Kong
311
11
0
25 Sep 2025
Neptune-X: Active X-to-Maritime Generation for Universal Maritime Object Detection
Neptune-X: Active X-to-Maritime Generation for Universal Maritime Object Detection
Yu Guo
Shengfeng He
Yuxu Lu
Haonan An
Yihang Tao
Huilin Zhu
Jingxian Liu
Yuguang Fang
246
1
0
25 Sep 2025
Video models are zero-shot learners and reasoners
Video models are zero-shot learners and reasoners
Thaddäus Wiedemer
Yuxuan Li
Paul Vicol
Shixiang Shane Gu
Nick Matarese
Kevin Swersky
Been Kim
P. Jaini
Robert Geirhos
VLMLRM
248
56
0
24 Sep 2025
Attack for Defense: Adversarial Agents for Point Prompt Optimization Empowering Segment Anything Model
Attack for Defense: Adversarial Agents for Point Prompt Optimization Empowering Segment Anything Model
Xueyu Liu
Xiaoyi Zhang
Guangze Shi
Meilin Liu
Yexin Lai
Yongfei Wu
Mingqiang Wei
LLMAGAAML
108
1
0
23 Sep 2025
MV-UMI: A Scalable Multi-View Interface for Cross-Embodiment Learning
MV-UMI: A Scalable Multi-View Interface for Cross-Embodiment Learning
Omar Rayyan
John Abanes
Mahmoud Hafez
Anthony Tzes
Fares Abu-Dakka
102
0
0
23 Sep 2025
Sa2VA-i: Improving Sa2VA Results with Consistent Training and Inference
Sa2VA-i: Improving Sa2VA Results with Consistent Training and Inference
Alexey Nekrasov
A. Athar
Daan de Geus
Alexander Hermans
Bastian Leibe
172
0
0
23 Sep 2025
The 1st Solution for MOSEv2 Challenge 2025: Long-term and Concept-aware Video Segmentation via SeC
The 1st Solution for MOSEv2 Challenge 2025: Long-term and Concept-aware Video Segmentation via SeC
Mingqi Gao
Jingkun Chen
Yunqi Miao
Gengshen Wu
Zhijin Qin
Jungong Han
120
0
0
23 Sep 2025
Previous
123456...161718
Next