ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.00714
  4. Cited By
SAM 2: Segment Anything in Images and Videos

SAM 2: Segment Anything in Images and Videos

International Conference on Learning Representations (ICLR), 2024
1 August 2024
Nikhila Ravi
Valentin Gabeur
Yuan-Ting Hu
Ronghang Hu
Chaitanya K. Ryali
Tengyu Ma
Haitham Khedr
Roman Rädle
Chloe Rolland
Laura Gustafson
Eric Mintun
Junting Pan
Kalyan Vasudev Alwala
Nicolas Carion
Chao-Yuan Wu
Ross B. Girshick
Piotr Dollár
Christoph Feichtenhofer
    VLMMLLM
ArXiv (abs)PDFHTMLHuggingFace (116 upvotes)

Papers citing "SAM 2: Segment Anything in Images and Videos"

50 / 861 papers shown
High Temporal Consistency through Semantic Similarity Propagation in Semi-Supervised Video Semantic Segmentation for Autonomous Flight
High Temporal Consistency through Semantic Similarity Propagation in Semi-Supervised Video Semantic Segmentation for Autonomous FlightComputer Vision and Pattern Recognition (CVPR), 2025
Cédric Vincent
Taehyoung Kim
Henri Meeß
256
2
0
19 Mar 2025
EgoDTM: Towards 3D-Aware Egocentric Video-Language Pretraining
EgoDTM: Towards 3D-Aware Egocentric Video-Language Pretraining
Boshen Xu
Yuting Mei
Xinbi Liu
Sipeng Zheng
Qin Jin
VLMMDE
546
2
0
19 Mar 2025
Learning 4D Panoptic Scene Graph Generation from Rich 2D Visual Scene
Learning 4D Panoptic Scene Graph Generation from Rich 2D Visual SceneComputer Vision and Pattern Recognition (CVPR), 2025
Shengqiong Wu
Hao Fei
Jingkang Yang
Xiaochen Li
Juncheng Li
Hao Zhang
Tat-Seng Chua
307
4
0
19 Mar 2025
Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control
Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control
Nvidia
Hassan Abu Alhaija
Jose M. Alvarez
Maciej Bala
Tiffany Cai
...
Yuchong Ye
Xiaodong Yang
Boxin Wang
Fangyin Wei
Yu Zeng
VGen
521
42
0
18 Mar 2025
AUTV: Creating Underwater Video Datasets with Pixel-wise Annotations
AUTV: Creating Underwater Video Datasets with Pixel-wise Annotations
Quang-Trung Truong
Wong Yuk Kwan
Duc Thanh Nguyen
Binh-Son Hua
Sai-Kit Yeung
VGen
318
1
0
17 Mar 2025
SAM2 for Image and Video Segmentation: A Comprehensive Survey
SAM2 for Image and Video Segmentation: A Comprehensive Survey
Zhang Jiaxing
Tang Hao
VLM
355
14
0
17 Mar 2025
DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models
DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models
Dewei Zhou
Mingwei Li
Zongxin Yang
Yi Yang
491
15
0
17 Mar 2025
SED-MVS: Segmentation-Driven and Edge-Aligned Deformation Multi-View Stereo with Depth Restoration and Occlusion Constraint
SED-MVS: Segmentation-Driven and Edge-Aligned Deformation Multi-View Stereo with Depth Restoration and Occlusion Constraint
Zhenlong Yuan
Zhidong Yang
Yujun Cai
Kuangxin Wu
Mufan Liu
Dapeng Zhang
Hao Jiang
Zhaoxin Li
Zhaoqi Wang
325
15
0
17 Mar 2025
VISO-Grasp: Vision-Language Informed Spatial Object-centric 6-DoF Active View Planning and Grasping in Clutter and Invisibility
VISO-Grasp: Vision-Language Informed Spatial Object-centric 6-DoF Active View Planning and Grasping in Clutter and Invisibility
Yitian Shi
Di Wen
Guanqi Chen
Edgar Welte
Sheng Liu
Kunyu Peng
Rainer Stiefelhagen
Rania Rayyes
374
8
0
16 Mar 2025
SPC-GS: Gaussian Splatting with Semantic-Prompt Consistency for Indoor Open-World Free-view Synthesis from Sparse Inputs
SPC-GS: Gaussian Splatting with Semantic-Prompt Consistency for Indoor Open-World Free-view Synthesis from Sparse InputsComputer Vision and Pattern Recognition (CVPR), 2025
Guibiao Liao
Qing Li
Zhenyu Bao
Guoping Qiu
Kanglin Liu
3DGS
239
2
0
16 Mar 2025
GeoRSMLLM: A Multimodal Large Language Model for Vision-Language Tasks in Geoscience and Remote Sensing
GeoRSMLLM: A Multimodal Large Language Model for Vision-Language Tasks in Geoscience and Remote Sensing
Zilun Zhang
Haozhan Shen
Tiancheng Zhao
Bin Chen
Zian Guan
Yuhao Wang
Xu Jia
Yuhao Wang
Yongheng Shang
Yuxiang Cai
212
4
0
16 Mar 2025
SPOC: Spatially-Progressing Object State Change Segmentation in Video
SPOC: Spatially-Progressing Object State Change Segmentation in Video
Priyanka Mandikal
Tushar Nagarajan
Alex Stoken
Zihui Xue
Kristen Grauman
258
1
0
15 Mar 2025
TACO: Taming Diffusion for in-the-wild Video Amodal Completion
TACO: Taming Diffusion for in-the-wild Video Amodal Completion
Ruijie Lu
Yixin Chen
Yu Liu
Jiaxiang Tang
Junfeng Ni
Diwen Wan
Gang Zeng
Siyuan Huang
DiffMVGen
464
9
0
15 Mar 2025
ReBot: Scaling Robot Learning with Real-to-Sim-to-Real Robotic Video Synthesis
ReBot: Scaling Robot Learning with Real-to-Sim-to-Real Robotic Video Synthesis
Yu Fang
Yue Yang
Xinghao Zhu
Kaiyuan Zheng
Gedas Bertasius
D. Szafir
Mingyu Ding
268
18
0
15 Mar 2025
ROS-SAM: High-Quality Interactive Segmentation for Remote Sensing Moving Object
ROS-SAM: High-Quality Interactive Segmentation for Remote Sensing Moving ObjectComputer Vision and Pattern Recognition (CVPR), 2025
Zhe Shan
Yang Liu
Lei Zhou
C. Yan
Haoyu Wang
Xia Xie
243
15
0
15 Mar 2025
PSF-4D: A Progressive Sampling Framework for View Consistent 4D Editing
PSF-4D: A Progressive Sampling Framework for View Consistent 4D Editing
H. Iqbal
Nazmul Karim
Umar Khalid
Azib Farooq
Z. Zhong
Jing Hua
Chen Chen
DiffM3DGSVGen
434
0
0
14 Mar 2025
EgoSplat: Open-Vocabulary Egocentric Scene Understanding with Language Embedded 3D Gaussian Splatting
Di Li
Jie Feng
Jiahao Chen
Weisheng Dong
Guanbin Li
G. Shi
Licheng Jiao
3DGSVLM
849
1
0
14 Mar 2025
Human-in-the-Loop Local Corrections of 3D Scene Layouts via Infilling
Human-in-the-Loop Local Corrections of 3D Scene Layouts via Infilling
Christopher Xie
A. Avetisyan
Henry Howard-Jenkins
Yawar Siddiqui
Julian Straub
Richard Newcombe
Vasileios Balntas
Jakob Julian Engel
3DH3DV
411
1
0
14 Mar 2025
Large-scale Pre-training for Grounded Video Caption Generation
Large-scale Pre-training for Grounded Video Caption Generation
Evangelos Kazakos
Cordelia Schmid
Josef Sivic
452
3
0
13 Mar 2025
The Power of One: A Single Example is All it Takes for Segmentation in VLMs
Mir Rayat Imtiaz Hossain
Mennatullah Siam
Leonid Sigal
James J. Little
MLLMVLM
578
2
0
13 Mar 2025
IMPACT: Intelligent Motion Planning with Acceptable Contact Trajectories via Vision-Language Models
IMPACT: Intelligent Motion Planning with Acceptable Contact Trajectories via Vision-Language Models
Yiyang Ling
Karan Owalekar
Oluwatobiloba Adesanya
Erdem Bıyık
Daniel Seita
346
5
0
13 Mar 2025
Towards Fast, Memory-based and Data-Efficient Vision-Language Policy
Haoxuan Li
Sixu Yan
Yongqian Li
Xinggang Wang
LM&Ro
333
2
0
13 Mar 2025
Do computer vision foundation models learn the low-level characteristics of the human visual system?
Do computer vision foundation models learn the low-level characteristics of the human visual system?Computer Vision and Pattern Recognition (CVPR), 2025
Yancheng Cai
Fei Yin
Dounia Hammou
Rafal Mantiuk
VLM
458
7
0
13 Mar 2025
GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding
GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding
R. Hu
Lianghui Zhu
Yuxuan Zhang
Tianheng Cheng
Lei Liu
Heng Liu
Longjin Ran
Xiaoxin Chen
Wenyu Liu
Xinggang Wang
ObjD
498
5
0
13 Mar 2025
LuciBot: Automated Robot Policy Learning from Generated Videos
Xiaowen Qiu
Yian Wang
Jiting Cai
Zhehuan Chen
Chunru Lin
Tsun-Hsuan Wang
Chuang Gan
LM&RoVGen
318
2
0
12 Mar 2025
PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop
Chenyu Li
Oscar Michel
Xichen Pan
Sainan Liu
Mike Roberts
Saining Xie
VGen
229
24
0
12 Mar 2025
2HandedAfforder: Learning Precise Actionable Bimanual Affordances from Human Videos
2HandedAfforder: Learning Precise Actionable Bimanual Affordances from Human Videos
Marvin Heidinger
Snehal Jauhri
V. Prasad
Georgia Chalvatzaki
327
3
0
12 Mar 2025
V2M4: 4D Mesh Animation Reconstruction from a Single Monocular Video
V2M4: 4D Mesh Animation Reconstruction from a Single Monocular Video
Jianqi Chen
Biao Zhang
Xiangjun Tang
Peter Wonka
VGen
304
15
0
11 Mar 2025
WildSeg3D: Segment Any 3D Objects in the Wild from 2D Images
WildSeg3D: Segment Any 3D Objects in the Wild from 2D Images
Yansong Guo
Jie Hu
Yansong Qu
Liujuan Cao
3DGS
1.1K
6
0
11 Mar 2025
Referring to Any Person
Referring to Any Person
Qing Jiang
Lin Wu
Zhaoyang Zeng
Tianhe Ren
Yuda Xiong
Yihao Chen
Qin Liu
Lei Zhang
932
12
0
11 Mar 2025
FAM-HRI: Foundation-Model Assisted Multi-Modal Human-Robot Interaction Combining Gaze and Speech
FAM-HRI: Foundation-Model Assisted Multi-Modal Human-Robot Interaction Combining Gaze and Speech
Yuzhi Lai
Shenghai Yuan
Boya Zhang
Benjamin Kiefer
Peizheng Li
Tianchen Deng
Andreas Zell
202
8
0
11 Mar 2025
MetaFold: Language-Guided Multi-Category Garment Folding Framework via Trajectory Generation and Foundation Model
MetaFold: Language-Guided Multi-Category Garment Folding Framework via Trajectory Generation and Foundation Model
Haonan Chen
Junxiao Li
Kai Cheng
Yiwei Liu
Yiwen Hou
...
Chongkai Gao
Zhenyu Wei
Shensi Xu
Jiaqi Huang
Lin Shao
AI4CE
268
4
0
11 Mar 2025
VRMDiff: Text-Guided Video Referring Matting Generation of Diffusion
Lehan Yang
Jincen Song
Tianlong Wang
Daiqing Qi
Weili Shi
Yuheng Liu
Sheng Li
DiffMVOSVGen
318
1
0
11 Mar 2025
YOLOE: Real-Time Seeing Anything
YOLOE: Real-Time Seeing Anything
Ao Wang
Lihao Liu
Hui Chen
Zijia Lin
Jiawei Han
Guiguang Ding
VLMObjD
549
34
0
10 Mar 2025
RS2-SAM2: Customized SAM2 for Referring Remote Sensing Image Segmentation
RS2-SAM2: Customized SAM2 for Referring Remote Sensing Image Segmentation
Fu Rong
Meng Lan
Qian Zhang
Guang Dai
523
1
0
10 Mar 2025
OmniSAM: Omnidirectional Segment Anything Model for UDA in Panoramic Semantic Segmentation
OmniSAM: Omnidirectional Segment Anything Model for UDA in Panoramic Semantic Segmentation
Ding Zhong
Xu Zheng
Chenfei Liao
Yuanhuiyi Lyu
Jialei Chen
Shengyang Wu
Linfeng Zhang
Xuming Hu
VLM
443
19
0
10 Mar 2025
MemorySAM: Memorize Modalities and Semantics with Segment Anything Model 2 for Multi-modal Semantic Segmentation
MemorySAM: Memorize Modalities and Semantics with Segment Anything Model 2 for Multi-modal Semantic Segmentation
Chenfei Liao
Xu Zheng
Yuanhuiyi Lyu
Haiwei Xue
Yihong Cao
Jiawen Wang
Kailun Yang
Xuming Hu
VLM
467
11
0
09 Mar 2025
SAQ-SAM: Semantically-Aligned Quantization for Segment Anything Model
SAQ-SAM: Semantically-Aligned Quantization for Segment Anything Model
Jing Zhang
Zhiyu Li
Chengzhi Hu
Xuewen Liu
Qingyi Gu
VLMMQ
230
0
0
09 Mar 2025
Online Dense Point Tracking with Streaming Memory
Online Dense Point Tracking with Streaming Memory
Qiaole Dong
Yanwei Fu
331
1
0
09 Mar 2025
Improving SAM for Camouflaged Object Detection via Dual Stream Adapters
Improving SAM for Camouflaged Object Detection via Dual Stream Adapters
Jiaming Liu
Linghe Kong
Guihai Chen
325
2
0
08 Mar 2025
Differentiable Rendering-based Pose Estimation for Surgical Robotic Instruments
Zekai Liang
Zih-Yun Chiu
Florian Richter
Michael C. Yip
MedIm
181
6
0
07 Mar 2025
Instrument-Splatting: Controllable Photorealistic Reconstruction of Surgical Instruments Using Gaussian Splatting
Instrument-Splatting: Controllable Photorealistic Reconstruction of Surgical Instruments Using Gaussian SplattingInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025
Shuojue Yang
Zijian Wu
Mingxuan Hong
Qian Li
Daiyun Shen
Septimiu E. Salcudean
Yueming Jin
3DGS
267
4
0
06 Mar 2025
Image-Based Relocalization and Alignment for Long-Term Monitoring of Dynamic Underwater Environments
Image-Based Relocalization and Alignment for Long-Term Monitoring of Dynamic Underwater EnvironmentsIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2025
Beverley Gorry
Tobias Fischer
Michael Milford
Alejandro Fontan
366
1
0
06 Mar 2025
Shaken, Not Stirred: A Novel Dataset for Visual Understanding of Glasses in Human-Robot Bartending Tasks
Shaken, Not Stirred: A Novel Dataset for Visual Understanding of Glasses in Human-Robot Bartending Tasks
Lukás Gajdosech
Hassan Ali
Jan-Gerrit Habekost
Martin Madaras
Matthias Kerzel
Stefan Wermter
357
0
0
06 Mar 2025
Surgical Gaussian Surfels: Highly Accurate Real-time Surgical Scene Rendering using Gaussian Surfels
Surgical Gaussian Surfels: Highly Accurate Real-time Surgical Scene Rendering using Gaussian Surfels
Idris O. Sunmola
Zhenjun Zhao
Samuel Schmidgall
Yumeng Wang
Paul Maria Scheikl
A. Krieger
Axel Krieger
3DGS
276
4
0
06 Mar 2025
Conformal In-Context Reverse Classification Accuracy: Efficient Estimation of Segmentation Quality with Statistical Guarantees
Conformal In-Context Reverse Classification Accuracy: Efficient Estimation of Segmentation Quality with Statistical Guarantees
Matias Cosarinsky
Ramiro Billot
Lucas Mansilla
Gabriel Gimenez
Nicolás Gaggion
Guanghui Fu
Tom Tirer
Enzo Ferrante
441
1
0
06 Mar 2025
WeakMedSAM: Weakly-Supervised Medical Image Segmentation via SAM with Sub-Class Exploration and Prompt Affinity MiningIEEE Transactions on Medical Imaging (IEEE TMI), 2025
Haoran Wang
Lian Huai
Wenbin Li
Lei Qi
Xingqun Jiang
Yinghuan Shi
MedIm
390
10
0
06 Mar 2025
CREStE: Scalable Mapless Navigation with Internet Scale Priors and Counterfactual Guidance
CREStE: Scalable Mapless Navigation with Internet Scale Priors and Counterfactual Guidance
Arthur Zhang
Harshit S. Sikchi
Amy Zhang
Joydeep Biswas
329
6
0
05 Mar 2025
SurgiSAM2: Fine-tuning a foundational model for surgical video anatomy segmentation and detection
Devanish N. Kamtam
Joseph B. Shrager
Satya Deepya Malla
Xiaohan Wang
Nicole Lin
Juan J. Cardona
Serena Yeung-Levy
Clarence Hu
VLM
215
3
0
05 Mar 2025
AirExo-2: Scaling up Generalizable Robotic Imitation Learning with Low-Cost Exoskeletons
AirExo-2: Scaling up Generalizable Robotic Imitation Learning with Low-Cost Exoskeletons
Hongjie Fang
Chenxi Wang
Yiming Wang
J. Chen
Shangning Xia
...
Xinyu Zhan
Lixin Yang
Weiming Wang
Cewu Lu
Hao-Shu Fang
529
14
0
05 Mar 2025
Previous
123...131415161718
Next
Page 14 of 18
Pageof 18