ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.00714
  4. Cited By
SAM 2: Segment Anything in Images and Videos

SAM 2: Segment Anything in Images and Videos

International Conference on Learning Representations (ICLR), 2024
1 August 2024
Nikhila Ravi
Valentin Gabeur
Yuan-Ting Hu
Ronghang Hu
Chaitanya K. Ryali
Tengyu Ma
Haitham Khedr
Roman Rädle
Chloe Rolland
Laura Gustafson
Eric Mintun
Junting Pan
Kalyan Vasudev Alwala
Nicolas Carion
Chao-Yuan Wu
Ross B. Girshick
Piotr Dollár
Christoph Feichtenhofer
    VLMMLLM
ArXiv (abs)PDFHTMLHuggingFace (116 upvotes)

Papers citing "SAM 2: Segment Anything in Images and Videos"

50 / 859 papers shown
ProTracker: Probabilistic Integration for Robust and Accurate Point Tracking
Tingyang Zhang
Chen Wang
Bushi Liu
Qingzhe Gao
Jiahui Lei
Baoquan Chen
Lingjie Liu
3DV
379
4
0
06 Jan 2025
SAM-EM: Real-Time Segmentation for Automated Liquid Phase Transmission Electron Microscopy
SAM-EM: Real-Time Segmentation for Automated Liquid Phase Transmission Electron Microscopy
Alexander Wang
Max Xu
Risha Goel
Zain Shabeeb
Isabel Panicker
Vida Jamali
VLM
167
0
0
06 Jan 2025
Soft and Compliant Contact-Rich Hair Manipulation and Care
Soft and Compliant Contact-Rich Hair Manipulation and CareIEEE/ACM International Conference on Human-Robot Interaction (HRI), 2025
Uksang Yoo
N. Dennler
Eliot Xing
Maja J. Matarić
Stefanos Nikolaidis
Jeffrey Ichnowski
Jean Oh
300
5
0
05 Jan 2025
VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control
VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control
Yuanpeng Tu
Hao Luo
Xi Chen
S. Ji
Xiang Bai
Hengshuang Zhao
DiffMVGen
538
29
0
02 Jan 2025
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLMComputer Vision and Pattern Recognition (CVPR), 2024
Yuqian Yuan
Hang Zhang
Wentong Li
Zesen Cheng
Boqiang Zhang
...
Deli Zhao
Wenqiao Zhang
Yueting Zhuang
Jianke Zhu
Lidong Bing
425
39
0
31 Dec 2024
ILDiff: Generate Transparent Animated Stickers by Implicit Layout Distillation
ILDiff: Generate Transparent Animated Stickers by Implicit Layout Distillation
Ting Zhang
Zhiqiang Yuan
Yeshuang Zhu
Jinchao Zhang
DiffM
325
0
0
31 Dec 2024
Gaussian Building Mesh (GBM): Extract a Building's 3D Mesh with Google Earth and Gaussian Splatting
Gaussian Building Mesh (GBM): Extract a Building's 3D Mesh with Google Earth and Gaussian Splatting
K. Gao
Liangzhi Li
Hongjie He
Dening Lu
Linlin Xu
Jonathan Li
GP3DGS
361
4
0
31 Dec 2024
MAKIMA: Tuning-free Multi-Attribute Open-domain Video Editing via Mask-Guided Attention Modulation
MAKIMA: Tuning-free Multi-Attribute Open-domain Video Editing via Mask-Guided Attention Modulation
Haoyu Zheng
Wenqiao Zhang
Zheqi Lv
Yu Zhong
Yang Dai
...
Yongliang Shen
Juncheng Billy Li
Dongping Zhang
Siliang Tang
Yueting Zhuang
DiffMVGen
267
2
0
31 Dec 2024
BODex: Scalable and Efficient Robotic Dexterous Grasp Synthesis Using Bilevel Optimization
BODex: Scalable and Efficient Robotic Dexterous Grasp Synthesis Using Bilevel OptimizationIEEE International Conference on Robotics and Automation (ICRA), 2024
Jiayi Chen
Yubin Ke
Hongan Wang
427
12
0
21 Dec 2024
MultiverSeg: Scalable Interactive Segmentation of Biomedical Imaging Datasets with In-Context Guidance
MultiverSeg: Scalable Interactive Segmentation of Biomedical Imaging Datasets with In-Context Guidance
Hallee E. Wong
Jose Javier Gonzalez Ortiz
John Guttag
Adrian V. Dalca
395
4
0
19 Dec 2024
M$^3$-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation
M3^33-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object SegmentationComputer Vision and Pattern Recognition (CVPR), 2024
Zixuan Chen
Jiaxin Li
Liming Tan
Yejie Guo
Junxuan Liang
Cewu Lu
Yongqian Li
VOS
400
0
0
18 Dec 2024
Measurement of Medial Elbow Joint Space using Landmark Detection
Measurement of Medial Elbow Joint Space using Landmark DetectionIEEE Access (IEEE Access), 2024
Shizuka Akahori
Shotaro Teruya
Pragyan Shrestha
Yuichi Yoshii
Ryuhei Michinobu
S. Iizuka
I. Kitahara
554
1
0
17 Dec 2024
IGR: Improving Diffusion Model for Garment Restoration from Person Image
IGR: Improving Diffusion Model for Garment Restoration from Person Image
Le Shen
Rong Huang
Zhijie Wang
DiffM
350
3
0
16 Dec 2024
InterDyn: Controllable Interactive Dynamics with Video Diffusion Models
InterDyn: Controllable Interactive Dynamics with Video Diffusion ModelsComputer Vision and Pattern Recognition (CVPR), 2024
Rick Akkerman
Haiwen Feng
M. Black
Dimitrios Tzionas
Victoria Fernandez-Abrevaya
VGenAI4CE
630
5
0
16 Dec 2024
Rethinking Detecting Salient and Camouflaged Objects in Unconstrained Scenes
Rethinking Detecting Salient and Camouflaged Objects in Unconstrained Scenes
Zhangjun Zhou
Yiping Li
Chunlin Zhong
Jianuo Huang
Jialun Pei
He Tang
He Tang
456
0
0
14 Dec 2024
Agtech Framework for Cranberry-Ripening Analysis Using Vision Foundation
  Models
Agtech Framework for Cranberry-Ripening Analysis Using Vision Foundation ModelsIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Faith Johnson
Ryan Meegan
Jack Lowry
Peter Oudemans
Kristin J. Dana
215
1
0
12 Dec 2024
Feat2GS: Probing Visual Foundation Models with Gaussian Splatting
Feat2GS: Probing Visual Foundation Models with Gaussian SplattingComputer Vision and Pattern Recognition (CVPR), 2024
Yue Chen
Xingyu Chen
Anpei Chen
Gerard Pons-Moll
Yuliang Xiu
3DGS
294
17
0
12 Dec 2024
Pinco: Position-induced Consistent Adapter for Diffusion Transformer in Foreground-conditioned Inpainting
Pinco: Position-induced Consistent Adapter for Diffusion Transformer in Foreground-conditioned Inpainting
Guangben Lu
Yuzhen Du
Zhimin Sun
Ran Yi
Yifan Qi
Yizhe Tang
Tianyi Wang
Lizhuang Ma
Fangyuan Zou
DiffM
343
3
0
05 Dec 2024
EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
Lu Qiu
Yuying Ge
Yi Chen
Yixiao Ge
Mingyu Ding
Xihui Liu
LLMAGLRM
416
19
0
05 Dec 2024
Referring Video Object Segmentation via Language-aligned Track Selection
Referring Video Object Segmentation via Language-aligned Track Selection
Seongchan Kim
Woojeong Jin
Sangbeom Lim
Heeji Yoon
Hyunwook Choi
Seungryong Kim
VOS
426
4
0
02 Dec 2024
T-3DGS: Removing Transient Objects for 3D Scene Reconstruction
T-3DGS: Removing Transient Objects for 3D Scene Reconstruction
Vadim Pryadilshchikov
Alexander Markin
Artem Komarichev
Ruslan Rakhimov
Peter Wonka
Evgeny Burnaev
3DGS
476
7
0
29 Nov 2024
Autonomous Imagination: Closed-Loop Decomposition of Visual-to-Textual Conversion in Visual Reasoning for Multimodal Large Language Models
Autonomous Imagination: Closed-Loop Decomposition of Visual-to-Textual Conversion in Visual Reasoning for Multimodal Large Language Models
Qingbin Liu
Yumeng Li
Boyuan Xiao
Yichang Jian
Ziang Qin
Tianjia Shao
Yao-Xiang Ding
Kun Zhou
LRMMLLM
513
4
0
27 Nov 2024
Omegance: A Single Parameter for Various Granularities in Diffusion-Based Synthesis
Omegance: A Single Parameter for Various Granularities in Diffusion-Based Synthesis
Xinyu Hou
Zongsheng Yue
Xiaoming Li
Chen Change Loy
VGenDiffM
382
0
0
26 Nov 2024
VideoDirector: Precise Video Editing via Text-to-Video Models
VideoDirector: Precise Video Editing via Text-to-Video ModelsComputer Vision and Pattern Recognition (CVPR), 2024
Yukun Wang
Longguang Wang
Zhiyuan Ma
Qibin Hu
Kai Xu
Yulan Guo
VGenDiffM
501
15
0
26 Nov 2024
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video SegmentationComputer Vision and Pattern Recognition (CVPR), 2024
Claudia Cuttano
Gabriele Trivigno
Gabriele Rosi
Carlo Masone
Giuseppe Averta
VOS
461
24
0
26 Nov 2024
vesselFM: A Foundation Model for Universal 3D Blood Vessel Segmentation
vesselFM: A Foundation Model for Universal 3D Blood Vessel SegmentationComputer Vision and Pattern Recognition (CVPR), 2024
Bastian Wittmann
Yannick Wattenberg
Tamaz Amiranashvili
Antonio Terpin
Bjoern Menze
409
7
0
26 Nov 2024
Leveraging Foundation Models To learn the shape of semi-fluid deformable
  objects
Leveraging Foundation Models To learn the shape of semi-fluid deformable objects
Omar El Assal
Carlos M. Mateo
Sebastien Ciron
David Fofi
250
0
0
25 Nov 2024
Phase-Informed Tool Segmentation for Manual Small-Incision Cataract
  Surgery
Phase-Informed Tool Segmentation for Manual Small-Incision Cataract SurgeryInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2024
Bhuvan Sachdeva
Naren Akash
Tajamul Ashraf
Simon Muller
T. Schultz
M. Wintergerst
Niharika Singri Prasad
K. Murali
Mohit Jain
269
2
0
25 Nov 2024
Language Driven Occupancy Prediction
Language Driven Occupancy Prediction
Zhu Yu
Bowen Pang
Lizhe Liu
Runmin Zhang
Qihao Peng
Maochun Luo
Maochun Luo
Mingxia Chen
Si-Yuan Cao
Hui-Liang Shen
488
7
0
25 Nov 2024
VideoOrion: Tokenizing Object Dynamics in Videos
VideoOrion: Tokenizing Object Dynamics in Videos
Yicheng Feng
Yijiang Li
Wanpeng Zhang
Sipeng Zheng
Zongqing Lu
Sipeng Zheng
Zongqing Lu
406
7
0
25 Nov 2024
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for RoboticsComputer Vision and Pattern Recognition (CVPR), 2024
Chan Hee Song
Valts Blukis
Jonathan Tremblay
Stephen Tyree
Yu-Chuan Su
Stan Birchfield
841
82
0
25 Nov 2024
Generative Omnimatte: Learning to Decompose Video into Layers
Generative Omnimatte: Learning to Decompose Video into LayersComputer Vision and Pattern Recognition (CVPR), 2024
Yao-Chih Lee
Erika Lu
Sarah Rumbley
Michal Geyer
Jia-Bin Huang
Tali Dekel
Forrester Cole
DiffMVGen
462
15
0
25 Nov 2024
There is no SAMantics! Exploring SAM as a Backbone for Visual
  Understanding Tasks
There is no SAMantics! Exploring SAM as a Backbone for Visual Understanding Tasks
Miguel Espinosa
Chenhongyi Yang
Linus Ericsson
Jingyu Sun
Elliot J. Crowley
VLM
317
5
0
22 Nov 2024
VIVID-10M: A Dataset and Baseline for Versatile and Interactive Video Local Editing
VIVID-10M: A Dataset and Baseline for Versatile and Interactive Video Local Editing
Jiahao Hu
Tianxiong Zhong
Xuebo Wang
Boyuan Jiang
Xingye Tian
Fei Yang
Pengfei Wan
Di Zhang
VGen
294
14
0
22 Nov 2024
Segment Anything in Light Fields for Real-Time Applications via
  Constrained Prompting
Segment Anything in Light Fields for Real-Time Applications via Constrained Prompting
Nikolai Goncharov
Donald G. Dansereau
VLM
225
3
0
21 Nov 2024
Learning Generalizable 3D Manipulation With 10 Demonstrations
Learning Generalizable 3D Manipulation With 10 Demonstrations
Yu Ren
Yang Cong
Ronghan Chen
Jiahao Long
SSL
238
0
0
15 Nov 2024
Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level
Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel LevelComputer Vision and Pattern Recognition (CVPR), 2024
Andong Deng
Tongjia Chen
Shoubin Yu
Taojiannan Yang
Lincoln Spencer
Yapeng Tian
Lin Wang
Joey Tianyi Zhou
Chen Chen
LRM
393
9
0
15 Nov 2024
CorrCLIP: Reconstructing Patch Correlations in CLIP for Open-Vocabulary Semantic Segmentation
CorrCLIP: Reconstructing Patch Correlations in CLIP for Open-Vocabulary Semantic Segmentation
Dengke Zhang
Fagui Liu
Quan Tang
VLM
665
2
0
15 Nov 2024
OneNet: A Channel-Wise 1D Convolutional U-Net
OneNet: A Channel-Wise 1D Convolutional U-Net
Sanghyun Byun
Kayvan Shah
Ayushi Gang
Christopher Apton
Jacob Song
Woo Seong Chung
SSeg
416
2
0
14 Nov 2024
Watermark Anything with Localized Messages
Watermark Anything with Localized MessagesInternational Conference on Learning Representations (ICLR), 2024
Tom Sander
Pierre Fernandez
Alain Durmus
Teddy Furon
Matthijs Douze
VLM
456
34
0
11 Nov 2024
VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in VideosComputer Vision and Pattern Recognition (CVPR), 2024
Shehan Munasinghe
Hanan Gani
Wenqi Zhu
Jiale Cao
Eric P. Xing
Fahad Shahbaz Khan
Salman Khan
MLLMVGenVLM
459
30
0
07 Nov 2024
ET-SEED: Efficient Trajectory-Level SE(3) Equivariant Diffusion Policy
ET-SEED: Efficient Trajectory-Level SE(3) Equivariant Diffusion PolicyInternational Conference on Learning Representations (ICLR), 2024
Chenrui Tie
Yue Chen
Kai Cheng
Boxuan Dong
Zhiyu Li
Chongkai Gao
Hao Dong
343
14
0
06 Nov 2024
MultiDepth: Multi-Sample Priors for Refining Monocular Metric Depth
  Estimations in Indoor Scenes
MultiDepth: Multi-Sample Priors for Refining Monocular Metric Depth Estimations in Indoor Scenes
Sanghyun Byun
Jacob Song
Woo Seong Chung
MDE
132
1
0
01 Nov 2024
ZIM: Zero-Shot Image Matting for Anything
ZIM: Zero-Shot Image Matting for Anything
Beomyoung Kim
Chanyong Shin
Joonhyun Jeong
Hyungsik Jung
Se Yun Lee
Sewhan Chun
Dong-Hyun Hwang
Joonsang Yu
VLM
337
7
0
01 Nov 2024
TPC: Test-time Procrustes Calibration for Diffusion-based Human Image Animation
TPC: Test-time Procrustes Calibration for Diffusion-based Human Image AnimationNeural Information Processing Systems (NeurIPS), 2024
Sunjae Yoon
Gwanhyeong Koo
Younghwan Lee
Chang D. Yoo
VGen
373
10
0
31 Oct 2024
EchoFM: Foundation Model for Generalizable Echocardiogram Analysis
EchoFM: Foundation Model for Generalizable Echocardiogram AnalysisIEEE Transactions on Medical Imaging (IEEE TMI), 2024
Sekeun Kim
Pengfei Jin
Qing Xiao
Cheng Chen
Yiwei Li
Hui Ren
Xiang Li
Tianming Liu
Quanzheng Li
265
5
0
30 Oct 2024
ReferEverything: Towards Segmenting Everything We Can Speak of in Videos
ReferEverything: Towards Segmenting Everything We Can Speak of in Videos
Anurag Bagchi
Zhipeng Bao
Yu-Xiong Wang
P. Tokmakov
Martial Hebert
VOS
285
2
0
30 Oct 2024
Addressing Issues with Working Memory in Video Object Segmentation
Addressing Issues with Working Memory in Video Object Segmentation
Clayton Bromley
Alexander Moore
Amar Saini
Douglas Poland
Carmen Carrano
VOS
105
1
0
29 Oct 2024
MovieCharacter: A Tuning-Free Framework for Controllable Character Video Synthesis
MovieCharacter: A Tuning-Free Framework for Controllable Character Video Synthesis
Di Qiu
Zheng Chen
Rui Wang
Mingyuan Fan
Changqian Yu
Junshi Huan
Xiang Wen
VGen
421
10
0
28 Oct 2024
Frontiers in Intelligent Colonoscopy
Frontiers in Intelligent Colonoscopy
Ge-Peng Ji
Jingyi Liu
Peng Xu
Nick Barnes
Fahad Shahbaz Khan
Salman Khan
Deng-Ping Fan
401
11
0
22 Oct 2024
Previous
123...15161718
Next