ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.00714
  4. Cited By
SAM 2: Segment Anything in Images and Videos

SAM 2: Segment Anything in Images and Videos

International Conference on Learning Representations (ICLR), 2024
1 August 2024
Nikhila Ravi
Valentin Gabeur
Yuan-Ting Hu
Ronghang Hu
Chaitanya K. Ryali
Tengyu Ma
Haitham Khedr
Roman Rädle
Chloe Rolland
Laura Gustafson
Eric Mintun
Junting Pan
Kalyan Vasudev Alwala
Nicolas Carion
Chao-Yuan Wu
Ross B. Girshick
Piotr Dollár
Christoph Feichtenhofer
    VLMMLLM
ArXiv (abs)PDFHTMLHuggingFace (116 upvotes)

Papers citing "SAM 2: Segment Anything in Images and Videos"

50 / 863 papers shown
Adaptive Articulated Object Manipulation On The Fly with Foundation Model Reasoning and Part Grounding
Adaptive Articulated Object Manipulation On The Fly with Foundation Model Reasoning and Part Grounding
Xiaojie Zhang
Yuanfei Wang
Kai Cheng
Kunqi Xu
Yu Li
Liuyu Xiang
Hao Dong
Zhaofeng He
164
2
0
24 Jul 2025
Moving Object Detection from Moving Camera Using Focus of Expansion Likelihood and Segmentation
Moving Object Detection from Moving Camera Using Focus of Expansion Likelihood and Segmentation
Masahiro Ogawa
Qi An
Atsushi Yamashita
155
0
0
18 Jul 2025
Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation
Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation
Zhen Xu
Hongyu Zhou
Sida Peng
Haotong Lin
Haoyu Guo
...
Yue Wang
Ruizhen Hu
Yiyi Liao
Xiaowei Zhou
Hujun Bao
VLM
214
3
0
15 Jul 2025
From Wardrobe to Canvas: Wardrobe Polyptych LoRA for Part-level Controllable Human Image Generation
From Wardrobe to Canvas: Wardrobe Polyptych LoRA for Part-level Controllable Human Image Generation
J. Kim
S. Park
Hyoungwoo Park
Sungrack Yun
Jaegul Choo
Seokeon Choi
DiffM
293
0
0
14 Jul 2025
Visuo-Acoustic Hand Pose and Contact Estimation
Visuo-Acoustic Hand Pose and Contact Estimation
Yuemin Ma
Uksang Yoo
Yunchao Yao
Shahram Najam Syed
Luca Bondi
Jonathan M Francis
Jean Oh
Jeffrey Ichnowski
150
1
0
13 Jul 2025
From One to More: Contextual Part Latents for 3D Generation
From One to More: Contextual Part Latents for 3D Generation
Shaocong Dong
Lihe Ding
Xiao Chen
Yaokun Li
Yuxin Wang
...
Chenjian Gao
Zhanpeng Huang
Zibin Wang
Tianfan Xue
Dan Xu
DiffM
267
8
0
11 Jul 2025
HiM2SAM: Enhancing SAM2 with Hierarchical Motion Estimation and Memory Optimization towards Long-term Tracking
HiM2SAM: Enhancing SAM2 with Hierarchical Motion Estimation and Memory Optimization towards Long-term Tracking
Ruixiang Chen
Guolei Sun
Yawei Li
Jie Qin
Luca Benini
370
3
0
10 Jul 2025
OTAS: Open-vocabulary Token Alignment for Outdoor Segmentation
OTAS: Open-vocabulary Token Alignment for Outdoor Segmentation
Simon Schwaiger
Stefan Thalhammer
Wilfried Wöber
Gerald Steinbauer-Wagner
169
0
0
08 Jul 2025
Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion
Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion
Aleksandar Jevtić
Christoph Reich
Felix Wimbauer
Oliver Hahn
Christian Rupprecht
Stefan Roth
Daniel Cremers
314
2
0
08 Jul 2025
OpenWorldSAM: Extending SAM2 for Universal Image Segmentation with Language Prompts
OpenWorldSAM: Extending SAM2 for Universal Image Segmentation with Language Prompts
Shiting Xiao
Rishabh Kabra
Yuhang Li
Donghyun Lee
João Carreira
Priyadarshini Panda
VLM
315
2
0
07 Jul 2025
ZERO: Industry-ready Vision Foundation Model with Multi-modal Prompts
ZERO: Industry-ready Vision Foundation Model with Multi-modal Prompts
Sangbum Choi
Kyeongryeol Go
Taewoong Jang
ObjDVLM
234
0
0
06 Jul 2025
Foundation versus Domain-specific Models: Performance Comparison, Fusion, and Explainability in Face Recognition
Foundation versus Domain-specific Models: Performance Comparison, Fusion, and Explainability in Face Recognition
Redwan Sony
Parisa Farmanifard
Arun Ross
Anil K. Jain
CVBMVLM
270
5
0
04 Jul 2025
The Sound of Simulation: Learning Multimodal Sim-to-Real Robot Policies with Generative Audio
The Sound of Simulation: Learning Multimodal Sim-to-Real Robot Policies with Generative Audio
Renhao Wang
Haoran Geng
Tingle Li
Feishi Wang
Gopala Anumanchipalli
Trevor Darrell
Boyi Li
Pieter Abbeel
Jitendra Malik
Alexei A. Efros
VGen
234
1
0
03 Jul 2025
SIU3R: Simultaneous Scene Understanding and 3D Reconstruction Beyond Feature Alignment
SIU3R: Simultaneous Scene Understanding and 3D Reconstruction Beyond Feature Alignment
Qi Xu
Dongxu Wei
Lingzhe Zhao
Wenpu Li
Zhangchi Huang
Shunping Ji
Peidong Liu
3DV
283
3
0
03 Jul 2025
NOCTIS: Novel Object Cyclic Threshold based Instance Segmentation
NOCTIS: Novel Object Cyclic Threshold based Instance Segmentation
Max Gandyra
Alessandro Santonicola
Michael Beetz
261
1
0
02 Jul 2025
Reasoning to Edit: Hypothetical Instruction-Based Image Editing with Visual Reasoning
Reasoning to Edit: Hypothetical Instruction-Based Image Editing with Visual Reasoning
Qingdong He
Xueqin Chen
Chaoyi Wang
Yanjie Pan
Xiaobin Hu
Zhenye Gan
Yabiao Wang
Chengjie Wang
Xiangtai Li
J. Zhang
213
3
0
02 Jul 2025
LatentMove: Towards Complex Human Movement Video Generation
LatentMove: Towards Complex Human Movement Video Generation
Ashkan Taghipour
Morteza Ghahremani
Mohammed Bennamoun
F. Boussaïd
Aref Miri Rekavandi
Zinuo Li
Qiuhong Ke
Hamid Laga
3DHVGen
301
2
0
01 Jul 2025
Geological Everything Model 3D: A Promptable Foundation Model for Unified and Zero-Shot Subsurface Understanding
Geological Everything Model 3D: A Promptable Foundation Model for Unified and Zero-Shot Subsurface Understanding
Yimin Dou
Xinming Wu
Nathan L Bangs
Harpreet Singh Sethi
Jintao Li
Hang Gao
Zhixiang Guo
AI4CE
242
1
0
01 Jul 2025
Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement
Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement
Yuqi Liu
Bohao Peng
Zhisheng Zhong
Zihao Yue
Fanbin Lu
Bei Yu
Jiaya Jia
LRMVLM
393
46
0
01 Jul 2025
SurgiSR4K: A High-Resolution Endoscopic Video Dataset for Robotic-Assisted Minimally Invasive Procedures
SurgiSR4K: A High-Resolution Endoscopic Video Dataset for Robotic-Assisted Minimally Invasive Procedures
Fengyi Jiang
Xiaorui Zhang
Lingbo Jin
Ruixing Liang
Yuxin Chen
...
Wenqing Sun
Cong Gao
Hallie McNamara
Jingpei Lu
Omid Mohareri
181
0
0
30 Jun 2025
SCORP: Scene-Consistent Object Refinement via Proxy Generation and Tuning
SCORP: Scene-Consistent Object Refinement via Proxy Generation and Tuning
Ziwei Chen
Ziling Liu
Zitong Huang
Mingqi Gao
Feng Zheng
206
0
0
30 Jun 2025
Grounding DINO-US-SAM: Text-Prompted Multi-Organ Segmentation in Ultrasound with LoRA-Tuned Vision-Language Models
Grounding DINO-US-SAM: Text-Prompted Multi-Organ Segmentation in Ultrasound with LoRA-Tuned Vision-Language ModelsIEEE Transactions on Ultrasonics, Ferroelectrics and Frequency Control (IEEE TUFFC), 2025
Hamza Rasaee
Taha Koleilat
H. Rivaz
224
2
0
30 Jun 2025
Foundation Models for Zero-Shot Segmentation of Scientific Images without AI-Ready Data
Foundation Models for Zero-Shot Segmentation of Scientific Images without AI-Ready Data
Shubhabrata Mukherjee
Jack Lang
Obeen Kwon
I. Zenyuk
Valerie Brogden
Adam Weber
D. Ushizima
VLM
168
2
0
30 Jun 2025
OmniVCus: Feedforward Subject-driven Video Customization with Multimodal Control Conditions
OmniVCus: Feedforward Subject-driven Video Customization with Multimodal Control Conditions
Yuanhao Cai
Chentao Song
Xi Chen
Jinbo Xing
Yiwei Hu
...
Tianyu Wang
Y. Zhang
Xiaokang Yang
Zhe Lin
Alan Yuille
DiffMVGen
297
5
0
29 Jun 2025
ProSAM: Enhancing the Robustness of SAM-based Visual Reference Segmentation with Probabilistic Prompts
ProSAM: Enhancing the Robustness of SAM-based Visual Reference Segmentation with Probabilistic Prompts
Xiaoqi Wang
Clint Sebastian
Wenbin He
Liu Ren
VLM
268
0
0
27 Jun 2025
MultiHuman-Testbench: Benchmarking Image Generation for Multiple Humans
MultiHuman-Testbench: Benchmarking Image Generation for Multiple Humans
Shubhankar Borse
Seokeon Choi
S. Park
J. Kim
Shreya Kadambi
Risheek Garrepalli
Sungrack Yun
Munawar Hayat
Fatih Porikli
EGVMVLM
280
2
0
25 Jun 2025
Evaluating the Robustness of Open-Source Vision-Language Models to Domain Shift in Object Captioning
Evaluating the Robustness of Open-Source Vision-Language Models to Domain Shift in Object Captioning
Federico Tavella
Amber Drinkwater
Angelo Cangelosi
94
0
0
24 Jun 2025
OmniGen2: Exploration to Advanced Multimodal Generation
OmniGen2: Exploration to Advanced Multimodal Generation
Chenyuan Wu
PengFei Zheng
Ruiran Yan
Shitao Xiao
Xin Luo
...
Defu Lian
X. Wang
Zhongyuan Wang
Tiejun Huang
Zheng Liu
MLLMSyDaVLM
333
173
0
23 Jun 2025
RGBTrack: Fast, Robust Depth-Free 6D Pose Estimation and Tracking
RGBTrack: Fast, Robust Depth-Free 6D Pose Estimation and Tracking
Teng Guo
Jingjin Yu
3DPC3DV
233
1
0
20 Jun 2025
Co-Seg++: Mutual Prompt-Guided Collaborative Learning for Versatile Medical Segmentation
Co-Seg++: Mutual Prompt-Guided Collaborative Learning for Versatile Medical Segmentation
Qing Xu
Yuxiang Luo
Wenting Duan
Daming Gao
275
3
0
20 Jun 2025
DIGMAPPER: A Modular System for Automated Geologic Map Digitization
DIGMAPPER: A Modular System for Automated Geologic Map Digitization
Weiwei Duan
Michael P. Gerlek
Steven Minton
Craig A. Knoblock
Fandel Lin
...
Leeje Jang
Sofia Kirsanova
Zekun Li
Yijun Lin
Yao-Yi Chiang
AI4CE
153
2
0
19 Jun 2025
ControlVLA: Few-shot Object-centric Adaptation for Pre-trained Vision-Language-Action Models
ControlVLA: Few-shot Object-centric Adaptation for Pre-trained Vision-Language-Action Models
Puhao Li
Yingying Wu
Ziheng Xi
Wanlin Li
Yuzhe Huang
...
Yinghan Chen
Jianan Wang
Song-Chun Zhu
Tengyu Liu
Siyuan Huang
LM&Ro
212
20
0
19 Jun 2025
NTIRE 2025 Image Shadow Removal Challenge Report
NTIRE 2025 Image Shadow Removal Challenge Report
Florin-Alexandru Vasluianu
Tim Seizinger
Z. Zhou
C. L. Philip Chen
Zongwei Wu
...
Suiyi Zhao
Bo Wang
Yan Luo
M. Y. Wang
Yilin Zhang
218
22
0
18 Jun 2025
SynPo: Boosting Training-Free Few-Shot Medical Segmentation via High-Quality Negative Prompts
SynPo: Boosting Training-Free Few-Shot Medical Segmentation via High-Quality Negative PromptsInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025
Yufei Liu
Haoke Xiao
Jiaxing Chai
Yongcun Zhang
Rong Wang
Zijie Meng
Shaozi Li
MedImVLM
161
0
0
18 Jun 2025
Open-World Object Counting in Videos
Open-World Object Counting in Videos
Niki Amini-Naieni
Andrew Zisserman
178
1
0
18 Jun 2025
MCOO-SLAM: A Multi-Camera Omnidirectional Object SLAM System
MCOO-SLAM: A Multi-Camera Omnidirectional Object SLAM System
Miaoxin Pan
Jinnan Li
Yaowen Zhang
Yi Yang
Yufeng Yue
152
1
0
18 Jun 2025
Prompting with the Future: Open-World Model Predictive Control with Interactive Digital Twins
Prompting with the Future: Open-World Model Predictive Control with Interactive Digital Twins
Chuanruo Ning
Kuan Fang
Wei-Chiu Ma
LM&RoAI4CE
235
5
0
16 Jun 2025
A Point Cloud Completion Approach for the Grasping of Partially Occluded Objects and Its Applications in Robotic Strawberry Harvesting
A Point Cloud Completion Approach for the Grasping of Partially Occluded Objects and Its Applications in Robotic Strawberry Harvesting
Ali Abouzeid
Malak Mansour
Chengsong Hu
Dezhen Song
149
1
0
16 Jun 2025
A Comprehensive Survey on Video Scene Parsing:Advances, Challenges, and Prospects
A Comprehensive Survey on Video Scene Parsing:Advances, Challenges, and Prospects
Guohuan Xie
Syed Ariff Syed Hesham
Wenya Guo
Bing Li
Ming-Ming Cheng
Guolei Sun
Yun-Hai Liu
177
1
0
16 Jun 2025
DeSPITE: Exploring Contrastive Deep Skeleton-Pointcloud-IMU-Text Embeddings for Advanced Point Cloud Human Activity Understanding
DeSPITE: Exploring Contrastive Deep Skeleton-Pointcloud-IMU-Text Embeddings for Advanced Point Cloud Human Activity Understanding
Thomas Kreutz
M. Mühlhäuser
Alejandro Sánchez Guinea
275
0
0
16 Jun 2025
Generative 4D Scene Gaussian Splatting with Object View-Synthesis Priors
Generative 4D Scene Gaussian Splatting with Object View-Synthesis Priors
Wen-Hsuan Chu
Lei Ke
Jianmeng Liu
Mingxiao Huo
P. Tokmakov
Katerina Fragkiadaki
3DGS
218
0
0
15 Jun 2025
DAVID-XR1: Detecting AI-Generated Videos with Explainable Reasoning
DAVID-XR1: Detecting AI-Generated Videos with Explainable Reasoning
Yifeng Gao
Yifan Ding
Hongyu Su
Juncheng Li
Yunhan Zhao
...
Li Wang
Xin Wang
Yixu Wang
Jiabo He
Yu-Gang Jiang
VGen
348
1
0
13 Jun 2025
In-Hand Object Pose Estimation via Visual-Tactile Fusion
In-Hand Object Pose Estimation via Visual-Tactile Fusion
Felix Nonnengießer
Alap Kshirsagar
Boris Belousov
Jan Peters
293
2
0
12 Jun 2025
Constrained Diffusion Models for Synthesizing Representative Power Flow Datasets
Constrained Diffusion Models for Synthesizing Representative Power Flow Datasets
Milad Hoseinpour
Vladimir Dvorkin
DiffMMedIm
243
0
0
12 Jun 2025
GENMANIP: LLM-driven Simulation for Generalizable Instruction-Following Manipulation
GENMANIP: LLM-driven Simulation for Generalizable Instruction-Following ManipulationComputer Vision and Pattern Recognition (CVPR), 2025
Ning Gao
Yilun Chen
Shuai Yang
Xinyi Chen
Yang Tian
Hao Li
Haifeng Huang
Hanqing Wang
Tai Wang
Jiangmiao Pang
LM&Ro
350
6
0
12 Jun 2025
DreamActor-H1: High-Fidelity Human-Product Demonstration Video Generation via Motion-designed Diffusion Transformers
DreamActor-H1: High-Fidelity Human-Product Demonstration Video Generation via Motion-designed Diffusion Transformers
Lizhen Wang
Zhurong Xia
T. Hu
P. Wang
Pengfei Wang
Zerong Zheng
Ming Zhou
Yuan Zhang
Mingyuan Gao
DiffMVGen
445
9
0
12 Jun 2025
Efficient Part-level 3D Object Generation via Dual Volume Packing
Jiaxiang Tang
Ruijie Lu
Zhaoshuo Li
Zekun Hao
Xuan Li
Fangyin Wei
Shuran Song
Gang Zeng
Ming-Yu Liu
Tsung-Yi Lin
OCL
313
16
0
11 Jun 2025
HunyuanVideo-HOMA: Generic Human-Object Interaction in Multimodal Driven Human Animation
Ziyao Huang
Zixiang Zhou
Juan Cao
Yifeng Ma
Yi Chen
...
Hongmei Wang
Qin Lin
Yuan Zhou
Qinglin Lu
Fan Tang
VGen
225
5
0
10 Jun 2025
iTACO: Interactable Digital Twins of Articulated Objects from Casually Captured RGBD Videos
Weikun Peng
Jun Lv
Cewu Lu
Manolis Savva
322
2
0
10 Jun 2025
Segment Concealed Objects with Incomplete SupervisionIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025
Chunming He
Kai Li
Yachao Zhang
Ziyun Yang
Youwei Pang
...
Chengyu Fang
Yulun Zhang
Linghe Kong
Xiu Li
Sina Farsiu
237
6
0
10 Jun 2025
Previous
123...8910...161718
Next
Page 9 of 18
Pageof 18