Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2408.00714
Cited By
SAM 2: Segment Anything in Images and Videos
International Conference on Learning Representations (ICLR), 2024
1 August 2024
Nikhila Ravi
Valentin Gabeur
Yuan-Ting Hu
Ronghang Hu
Chaitanya K. Ryali
Tengyu Ma
Haitham Khedr
Roman Rädle
Chloe Rolland
Laura Gustafson
Eric Mintun
Junting Pan
Kalyan Vasudev Alwala
Nicolas Carion
Chao-Yuan Wu
Ross B. Girshick
Piotr Dollár
Christoph Feichtenhofer
VLM
MLLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (116 upvotes)
Papers citing
"SAM 2: Segment Anything in Images and Videos"
50 / 859 papers shown
ProTracker: Probabilistic Integration for Robust and Accurate Point Tracking
Tingyang Zhang
Chen Wang
Bushi Liu
Qingzhe Gao
Jiahui Lei
Baoquan Chen
Lingjie Liu
3DV
379
4
0
06 Jan 2025
SAM-EM: Real-Time Segmentation for Automated Liquid Phase Transmission Electron Microscopy
Alexander Wang
Max Xu
Risha Goel
Zain Shabeeb
Isabel Panicker
Vida Jamali
VLM
167
0
0
06 Jan 2025
Soft and Compliant Contact-Rich Hair Manipulation and Care
IEEE/ACM International Conference on Human-Robot Interaction (HRI), 2025
Uksang Yoo
N. Dennler
Eliot Xing
Maja J. Matarić
Stefanos Nikolaidis
Jeffrey Ichnowski
Jean Oh
300
5
0
05 Jan 2025
VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control
Yuanpeng Tu
Hao Luo
Xi Chen
S. Ji
Xiang Bai
Hengshuang Zhao
DiffM
VGen
538
29
0
02 Jan 2025
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
Computer Vision and Pattern Recognition (CVPR), 2024
Yuqian Yuan
Hang Zhang
Wentong Li
Zesen Cheng
Boqiang Zhang
...
Deli Zhao
Wenqiao Zhang
Yueting Zhuang
Jianke Zhu
Lidong Bing
425
39
0
31 Dec 2024
ILDiff: Generate Transparent Animated Stickers by Implicit Layout Distillation
Ting Zhang
Zhiqiang Yuan
Yeshuang Zhu
Jinchao Zhang
DiffM
325
0
0
31 Dec 2024
Gaussian Building Mesh (GBM): Extract a Building's 3D Mesh with Google Earth and Gaussian Splatting
K. Gao
Liangzhi Li
Hongjie He
Dening Lu
Linlin Xu
Jonathan Li
GP
3DGS
361
4
0
31 Dec 2024
MAKIMA: Tuning-free Multi-Attribute Open-domain Video Editing via Mask-Guided Attention Modulation
Haoyu Zheng
Wenqiao Zhang
Zheqi Lv
Yu Zhong
Yang Dai
...
Yongliang Shen
Juncheng Billy Li
Dongping Zhang
Siliang Tang
Yueting Zhuang
DiffM
VGen
267
2
0
31 Dec 2024
BODex: Scalable and Efficient Robotic Dexterous Grasp Synthesis Using Bilevel Optimization
IEEE International Conference on Robotics and Automation (ICRA), 2024
Jiayi Chen
Yubin Ke
Hongan Wang
427
12
0
21 Dec 2024
MultiverSeg: Scalable Interactive Segmentation of Biomedical Imaging Datasets with In-Context Guidance
Hallee E. Wong
Jose Javier Gonzalez Ortiz
John Guttag
Adrian V. Dalca
395
4
0
19 Dec 2024
M
3
^3
3
-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation
Computer Vision and Pattern Recognition (CVPR), 2024
Zixuan Chen
Jiaxin Li
Liming Tan
Yejie Guo
Junxuan Liang
Cewu Lu
Yongqian Li
VOS
400
0
0
18 Dec 2024
Measurement of Medial Elbow Joint Space using Landmark Detection
IEEE Access (IEEE Access), 2024
Shizuka Akahori
Shotaro Teruya
Pragyan Shrestha
Yuichi Yoshii
Ryuhei Michinobu
S. Iizuka
I. Kitahara
554
1
0
17 Dec 2024
IGR: Improving Diffusion Model for Garment Restoration from Person Image
Le Shen
Rong Huang
Zhijie Wang
DiffM
350
3
0
16 Dec 2024
InterDyn: Controllable Interactive Dynamics with Video Diffusion Models
Computer Vision and Pattern Recognition (CVPR), 2024
Rick Akkerman
Haiwen Feng
M. Black
Dimitrios Tzionas
Victoria Fernandez-Abrevaya
VGen
AI4CE
630
5
0
16 Dec 2024
Rethinking Detecting Salient and Camouflaged Objects in Unconstrained Scenes
Zhangjun Zhou
Yiping Li
Chunlin Zhong
Jianuo Huang
Jialun Pei
He Tang
He Tang
456
0
0
14 Dec 2024
Agtech Framework for Cranberry-Ripening Analysis Using Vision Foundation Models
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Faith Johnson
Ryan Meegan
Jack Lowry
Peter Oudemans
Kristin J. Dana
215
1
0
12 Dec 2024
Feat2GS: Probing Visual Foundation Models with Gaussian Splatting
Computer Vision and Pattern Recognition (CVPR), 2024
Yue Chen
Xingyu Chen
Anpei Chen
Gerard Pons-Moll
Yuliang Xiu
3DGS
294
17
0
12 Dec 2024
Pinco: Position-induced Consistent Adapter for Diffusion Transformer in Foreground-conditioned Inpainting
Guangben Lu
Yuzhen Du
Zhimin Sun
Ran Yi
Yifan Qi
Yizhe Tang
Tianyi Wang
Lizhuang Ma
Fangyuan Zou
DiffM
343
3
0
05 Dec 2024
EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
Lu Qiu
Yuying Ge
Yi Chen
Yixiao Ge
Mingyu Ding
Xihui Liu
LLMAG
LRM
416
19
0
05 Dec 2024
Referring Video Object Segmentation via Language-aligned Track Selection
Seongchan Kim
Woojeong Jin
Sangbeom Lim
Heeji Yoon
Hyunwook Choi
Seungryong Kim
VOS
426
4
0
02 Dec 2024
T-3DGS: Removing Transient Objects for 3D Scene Reconstruction
Vadim Pryadilshchikov
Alexander Markin
Artem Komarichev
Ruslan Rakhimov
Peter Wonka
Evgeny Burnaev
3DGS
476
7
0
29 Nov 2024
Autonomous Imagination: Closed-Loop Decomposition of Visual-to-Textual Conversion in Visual Reasoning for Multimodal Large Language Models
Qingbin Liu
Yumeng Li
Boyuan Xiao
Yichang Jian
Ziang Qin
Tianjia Shao
Yao-Xiang Ding
Kun Zhou
LRM
MLLM
513
4
0
27 Nov 2024
Omegance: A Single Parameter for Various Granularities in Diffusion-Based Synthesis
Xinyu Hou
Zongsheng Yue
Xiaoming Li
Chen Change Loy
VGen
DiffM
382
0
0
26 Nov 2024
VideoDirector: Precise Video Editing via Text-to-Video Models
Computer Vision and Pattern Recognition (CVPR), 2024
Yukun Wang
Longguang Wang
Zhiyuan Ma
Qibin Hu
Kai Xu
Yulan Guo
VGen
DiffM
501
15
0
26 Nov 2024
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation
Computer Vision and Pattern Recognition (CVPR), 2024
Claudia Cuttano
Gabriele Trivigno
Gabriele Rosi
Carlo Masone
Giuseppe Averta
VOS
461
24
0
26 Nov 2024
vesselFM: A Foundation Model for Universal 3D Blood Vessel Segmentation
Computer Vision and Pattern Recognition (CVPR), 2024
Bastian Wittmann
Yannick Wattenberg
Tamaz Amiranashvili
Antonio Terpin
Bjoern Menze
409
7
0
26 Nov 2024
Leveraging Foundation Models To learn the shape of semi-fluid deformable objects
Omar El Assal
Carlos M. Mateo
Sebastien Ciron
David Fofi
250
0
0
25 Nov 2024
Phase-Informed Tool Segmentation for Manual Small-Incision Cataract Surgery
International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2024
Bhuvan Sachdeva
Naren Akash
Tajamul Ashraf
Simon Muller
T. Schultz
M. Wintergerst
Niharika Singri Prasad
K. Murali
Mohit Jain
269
2
0
25 Nov 2024
Language Driven Occupancy Prediction
Zhu Yu
Bowen Pang
Lizhe Liu
Runmin Zhang
Qihao Peng
Maochun Luo
Maochun Luo
Mingxia Chen
Si-Yuan Cao
Hui-Liang Shen
488
7
0
25 Nov 2024
VideoOrion: Tokenizing Object Dynamics in Videos
Yicheng Feng
Yijiang Li
Wanpeng Zhang
Sipeng Zheng
Zongqing Lu
Sipeng Zheng
Zongqing Lu
406
7
0
25 Nov 2024
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics
Computer Vision and Pattern Recognition (CVPR), 2024
Chan Hee Song
Valts Blukis
Jonathan Tremblay
Stephen Tyree
Yu-Chuan Su
Stan Birchfield
841
82
0
25 Nov 2024
Generative Omnimatte: Learning to Decompose Video into Layers
Computer Vision and Pattern Recognition (CVPR), 2024
Yao-Chih Lee
Erika Lu
Sarah Rumbley
Michal Geyer
Jia-Bin Huang
Tali Dekel
Forrester Cole
DiffM
VGen
462
15
0
25 Nov 2024
There is no SAMantics! Exploring SAM as a Backbone for Visual Understanding Tasks
Miguel Espinosa
Chenhongyi Yang
Linus Ericsson
Jingyu Sun
Elliot J. Crowley
VLM
317
5
0
22 Nov 2024
VIVID-10M: A Dataset and Baseline for Versatile and Interactive Video Local Editing
Jiahao Hu
Tianxiong Zhong
Xuebo Wang
Boyuan Jiang
Xingye Tian
Fei Yang
Pengfei Wan
Di Zhang
VGen
294
14
0
22 Nov 2024
Segment Anything in Light Fields for Real-Time Applications via Constrained Prompting
Nikolai Goncharov
Donald G. Dansereau
VLM
225
3
0
21 Nov 2024
Learning Generalizable 3D Manipulation With 10 Demonstrations
Yu Ren
Yang Cong
Ronghan Chen
Jiahao Long
SSL
238
0
0
15 Nov 2024
Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level
Computer Vision and Pattern Recognition (CVPR), 2024
Andong Deng
Tongjia Chen
Shoubin Yu
Taojiannan Yang
Lincoln Spencer
Yapeng Tian
Lin Wang
Joey Tianyi Zhou
Chen Chen
LRM
393
9
0
15 Nov 2024
CorrCLIP: Reconstructing Patch Correlations in CLIP for Open-Vocabulary Semantic Segmentation
Dengke Zhang
Fagui Liu
Quan Tang
VLM
665
2
0
15 Nov 2024
OneNet: A Channel-Wise 1D Convolutional U-Net
Sanghyun Byun
Kayvan Shah
Ayushi Gang
Christopher Apton
Jacob Song
Woo Seong Chung
SSeg
416
2
0
14 Nov 2024
Watermark Anything with Localized Messages
International Conference on Learning Representations (ICLR), 2024
Tom Sander
Pierre Fernandez
Alain Durmus
Teddy Furon
Matthijs Douze
VLM
456
34
0
11 Nov 2024
VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
Computer Vision and Pattern Recognition (CVPR), 2024
Shehan Munasinghe
Hanan Gani
Wenqi Zhu
Jiale Cao
Eric P. Xing
Fahad Shahbaz Khan
Salman Khan
MLLM
VGen
VLM
459
30
0
07 Nov 2024
ET-SEED: Efficient Trajectory-Level SE(3) Equivariant Diffusion Policy
International Conference on Learning Representations (ICLR), 2024
Chenrui Tie
Yue Chen
Kai Cheng
Boxuan Dong
Zhiyu Li
Chongkai Gao
Hao Dong
343
14
0
06 Nov 2024
MultiDepth: Multi-Sample Priors for Refining Monocular Metric Depth Estimations in Indoor Scenes
Sanghyun Byun
Jacob Song
Woo Seong Chung
MDE
132
1
0
01 Nov 2024
ZIM: Zero-Shot Image Matting for Anything
Beomyoung Kim
Chanyong Shin
Joonhyun Jeong
Hyungsik Jung
Se Yun Lee
Sewhan Chun
Dong-Hyun Hwang
Joonsang Yu
VLM
337
7
0
01 Nov 2024
TPC: Test-time Procrustes Calibration for Diffusion-based Human Image Animation
Neural Information Processing Systems (NeurIPS), 2024
Sunjae Yoon
Gwanhyeong Koo
Younghwan Lee
Chang D. Yoo
VGen
373
10
0
31 Oct 2024
EchoFM: Foundation Model for Generalizable Echocardiogram Analysis
IEEE Transactions on Medical Imaging (IEEE TMI), 2024
Sekeun Kim
Pengfei Jin
Qing Xiao
Cheng Chen
Yiwei Li
Hui Ren
Xiang Li
Tianming Liu
Quanzheng Li
265
5
0
30 Oct 2024
ReferEverything: Towards Segmenting Everything We Can Speak of in Videos
Anurag Bagchi
Zhipeng Bao
Yu-Xiong Wang
P. Tokmakov
Martial Hebert
VOS
285
2
0
30 Oct 2024
Addressing Issues with Working Memory in Video Object Segmentation
Clayton Bromley
Alexander Moore
Amar Saini
Douglas Poland
Carmen Carrano
VOS
105
1
0
29 Oct 2024
MovieCharacter: A Tuning-Free Framework for Controllable Character Video Synthesis
Di Qiu
Zheng Chen
Rui Wang
Mingyuan Fan
Changqian Yu
Junshi Huan
Xiang Wen
VGen
421
10
0
28 Oct 2024
Frontiers in Intelligent Colonoscopy
Ge-Peng Ji
Jingyi Liu
Peng Xu
Nick Barnes
Fahad Shahbaz Khan
Salman Khan
Deng-Ping Fan
401
11
0
22 Oct 2024
Previous
1
2
3
...
15
16
17
18
Next