Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2408.00714
Cited By
SAM 2: Segment Anything in Images and Videos
International Conference on Learning Representations (ICLR), 2024
1 August 2024
Nikhila Ravi
Valentin Gabeur
Yuan-Ting Hu
Ronghang Hu
Chaitanya K. Ryali
Tengyu Ma
Haitham Khedr
Roman Rädle
Chloe Rolland
Laura Gustafson
Eric Mintun
Junting Pan
Kalyan Vasudev Alwala
Nicolas Carion
Chao-Yuan Wu
Ross B. Girshick
Piotr Dollár
Christoph Feichtenhofer
VLM
MLLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (116 upvotes)
Papers citing
"SAM 2: Segment Anything in Images and Videos"
50 / 824 papers shown
Title
UniVideo: Unified Understanding, Generation, and Editing for Videos
Cong Wei
Quande Liu
Zixuan Ye
Qiulin Wang
Xintao Wang
Pengfei Wan
Kun Gai
Wenhu Chen
VGen
221
10
0
09 Oct 2025
DexMan: Learning Bimanual Dexterous Manipulation from Human and Generated Videos
Jhen Hsieh
Kuan-Hsun Tu
Kuo-Han Hung
Tsung-Wei Ke
124
0
0
09 Oct 2025
R2RGEN: Real-to-Real 3D Data Generation for Spatially Generalized Manipulation
Xiuwei Xu
Angyuan Ma
Hankun Li
Bingyao Yu
Zheng Zhu
Jie Zhou
Jiwen Lu
126
0
0
09 Oct 2025
TTOM: Test-Time Optimization and Memorization for Compositional Video Generation
Leigang Qu
Ziyang Wang
Na Zheng
Wenjie Wang
Liqiang Nie
Tat-Seng Chua
142
1
0
09 Oct 2025
ConPoSe: LLM-Guided Contact Point Selection for Scalable Cooperative Object Pushing
Noah Steinkrüger
Nisarga Nilavadi
Wolfram Burgard
Tanja Katharina Kaiser
84
0
0
09 Oct 2025
FlexTraj: Image-to-Video Generation with Flexible Point Trajectory Control
Zhiyuan Zhang
Can Wang
Dongdong Chen
Jing Liao
VGen
228
2
0
09 Oct 2025
DynamicEval: Rethinking Evaluation for Dynamic Text-to-Video Synthesis
Nithin C. Babu
Aniruddha Mahapatra
Harsh Rangwani
Rajiv Soundararajan
Kuldeep Kulkarni
EGVM
VGen
157
0
0
08 Oct 2025
Temporal Prompting Matters: Rethinking Referring Video Object Segmentation
Ci-Siang Lin
Min-Hung Chen
I-Jieh Liu
Chien-Yi Wang
Sifei Liu
Yu-Chun Wang
VOS
130
0
0
08 Oct 2025
Addressing the ID-Matching Challenge in Long Video Captioning
Zhantao Yang
Huangji Wang
Ruili Feng
Han Zhang
Yuting Hu
Shangwen Zhu
Junyan Li
Yu Liu
Fan Cheng
96
0
0
08 Oct 2025
Few-Shot Adaptation Benchmark for Remote Sensing Vision-Language Models
K. E. Khoury
Maxime Zanella
Christophe De Vleeschouwer
Benoît Macq
VLM
94
1
0
08 Oct 2025
TIGeR: Tool-Integrated Geometric Reasoning in Vision-Language Models for Robotics
Yi Han
Cheng Chi
Enshen Zhou
Shanyu Rong
Jingkun An
Pengwei Wang
Zhongyuan Wang
Lu Sheng
Shanghang Zhang
LRM
192
8
0
08 Oct 2025
Vi-TacMan: Articulated Object Manipulation via Vision and Touch
Leiyao Cui
Zihang Zhao
Sirui Xie
Wenhuan Zhang
Zhi Han
Yixin Zhu
104
0
0
07 Oct 2025
DeLTa: Demonstration and Language-Guided Novel Transparent Object Manipulation
Taeyeop Lee
Gyuree Kang
Bowen Wen
Y. Kim
S. Back
In So Kweon
David Hyunchul Shim
KuK-Jin Yoon
112
1
0
07 Oct 2025
Improved High-probability Convergence Guarantees of Decentralized SGD
Aleksandar Armacki
Ali H. Sayed
76
0
0
07 Oct 2025
Human3R: Everyone Everywhere All at Once
Yue Chen
Xingyu Chen
Yuxuan Xue
Anpei Chen
Yuliang Xiu
Gerard Pons-Moll
3DH
3DGS
148
2
0
07 Oct 2025
BioAutoML-NAS: An End-to-End AutoML Framework for Multimodal Insect Classification via Neural Architecture Search on Large-Scale Biodiversity Data
Arefin Ittesafun Abian
Debopom Sutradhar
Md Rafi Ur Rashid
Reem E. Mohamed
M. Islam
Asif Karim
Kheng Cher Yeo
Sami Azam
105
0
0
07 Oct 2025
EmbodiedCoder: Parameterized Embodied Mobile Manipulation via Modern Coding Model
Zefu Lin
Rongxu Cui
Chen Hanning
Xiangyu Wang
Junjia Xu
...
Chen Wenbo
Hui Zhou
Lue Fan
W. Li
Zhaoxiang Zhang
LM&Ro
145
1
0
07 Oct 2025
On Powerful Ways to Generate: Autoregression, Diffusion, and Beyond
Chenxiao Yang
Cai Zhou
David Wipf
Zhiyuan Li
DiffM
149
0
0
07 Oct 2025
Character Mixing for Video Generation
Tingting Liao
Chongjian Ge
Guangyi Liu
Hao Li
Yi Zhou
VGen
93
1
0
06 Oct 2025
SPEGNet: Synergistic Perception-Guided Network for Camouflaged Object Detection
Baber Jan
Saeed Anwar
Aiman El-Maleh
Abdul Jabbar Siddiqui
Abdul Bais
104
0
0
06 Oct 2025
SegMASt3R: Geometry Grounded Segment Matching
Rohit Jayanti
Swayam Agrawal
Vansh Garg
Siddharth Tourani
Muhammad Haris Khan
Sourav Garg
Madhava Krishna
3DV
225
0
0
06 Oct 2025
A multi-modal tactile fingertip design for robotic hands to enhance dexterous manipulation
Zhuowei Xu
Zilin Si
Kevin Zhang
Oliver Kroemer
Zeynep Temel
96
0
0
06 Oct 2025
EmbodiSwap for Zero-Shot Robot Imitation Learning
Eadom Dessalene
P. Mantripragada
Michael Maynord
Yiannis Aloimonos
LM&Ro
80
0
0
04 Oct 2025
Bridge Thinking and Acting: Unleashing Physical Potential of VLM with Generalizable Action Expert
Mingyu Liu
Zheng Huang
Xiaoyi Lin
Huanyi Zheng
Canyu Zhao
Zongze Du
Y. Wang
Haoyi Zhu
Hao Chen
Chunhua Shen
109
0
0
04 Oct 2025
SAMSOD: Rethinking SAM Optimization for RGB-T Salient Object Detection
Zhengyi Liu
Xinrui Wang
Xianyong Fang
Zhengzheng Tu
Linbo Wang
86
0
0
04 Oct 2025
Towards Scalable and Consistent 3D Editing
Ruihao Xia
Yang Tang
Pan Zhou
DiffM
120
2
0
03 Oct 2025
Dynamic Prompt Generation for Interactive 3D Medical Image Segmentation Training
Tidiane Camaret N'dir
Alexander Pfefferle
Robin Tibor Schirrmeister
MedIm
3DH
216
1
0
03 Oct 2025
Med-K2N: Flexible K-to-N Modality Translation for Medical Image Synthesis
Feng Yuan
Yifan Gao
Yuehua Ye
Haoyue Li
Xin Gao
MedIm
64
0
0
03 Oct 2025
Geometry Meets Vision: Revisiting Pretrained Semantics in Distilled Fields
Zhiting Mei
Ola Shorinwa
Anirudha Majumdar
108
1
0
03 Oct 2025
Inferring Dynamic Physical Properties from Video Foundation Models
Guanqi Zhan
Xianzheng Ma
Weidi Xie
Andrew Zisserman
VGen
120
1
0
02 Oct 2025
When Tracking Fails: Analyzing Failure Modes of SAM2 for Point-Based Tracking in Surgical Videos
Woowon Jang
Jiwon Im
Juseung Choi
Niki Rashidian
W. D. Neve
Utku Ozbulak
96
0
0
02 Oct 2025
Holistic Order Prediction in Natural Scenes
Pierre Musacchio
Hyunmin Lee
Jaesik Park
3DV
227
0
0
02 Oct 2025
IMAGEdit: Let Any Subject Transform
Fei Shen
Weihao Xu
Rui Yan
Dong Zhang
Xiangbo Shu
Jinhui Tang
VGen
88
0
0
01 Oct 2025
Instant4D: 4D Gaussian Splatting in Minutes
Zhanpeng Luo
Haoxi Ran
Li Lu
3DGS
VGen
136
1
0
01 Oct 2025
Domain-Specialized Interactive Segmentation Framework for Meningioma Radiotherapy Planning
J. Lee
Han Jang
Kyu Sung Choi
52
0
0
01 Oct 2025
Robust Context-Aware Object Recognition
Klara Janouskova
Cristian Gavrus
Jirí Matas
140
0
0
01 Oct 2025
Affordance-Guided Diffusion Prior for 3D Hand Reconstruction
Naru Suzuki
Takehiko Ohkawa
Tatsuro Banno
Jihyun Lee
Ryosuke Furuta
Yoichi Sato
DiffM
127
1
0
01 Oct 2025
Assessing Foundation Models for Mold Colony Detection with Limited Training Data
Henrik Pichler
Janis Keuper
Matthew Copping
59
0
0
01 Oct 2025
Towards Unified Multimodal Misinformation Detection in Social Media: A Benchmark Dataset and Baseline
Haiyang Li
Yaxiong Wang
Lianwei Wu
Lechao Cheng
Lechao Cheng
Zhun Zhong
158
0
0
30 Sep 2025
Cat: Post-Training Quantization Error Reduction via Cluster-based Affine Transformation
Ali Zoljodi
Radu Timofte
Masoud Daneshtalab
MQ
119
0
0
30 Sep 2025
The 1st Solution for MOSEv1 Challenge on LSVOS 2025: CGFSeg
Tingmin Li
Yixuan Li
Yang Yang
VOS
152
0
0
30 Sep 2025
A Systematic Study of Large Language Models for Task and Motion Planning With PDDLStream
Jorge Mendez-Mendez
LRM
73
0
0
30 Sep 2025
Triangle Splatting+: Differentiable Rendering with Opaque Triangles
Jan Held
Renaud Vandeghen
Sanghyun Son
Daniel Rebain
Matheus Gadelha
Yi Zhou
Ming-Chyuan Lin
Marc Van Droogenbroeck
Andrea Tagliasacchi
3DGS
106
1
0
29 Sep 2025
LayerD: Decomposing Raster Graphic Designs into Layers
Tomoyuki Suzuki
Kang-Jun Liu
Naoto Inoue
Kota Yamaguchi
127
3
0
29 Sep 2025
NeoWorld: Neural Simulation of Explorable Virtual Worlds via Progressive 3D Unfolding
Yanpeng Zhao
Shanyan Guan
Y Samuel Wang
Yanhao Ge
Wei-Jang Li
Xiaokang Yang
VGen
112
0
0
29 Sep 2025
IA-VLA: Input Augmentation for Vision-Language-Action models in settings with semantically complex tasks
Eric Hannus
Miika Malin
Tran Minh Son Le
Ville Kyrki
VLM
56
0
0
29 Sep 2025
Efficient Domain-Adaptive Multi-Task Dense Prediction with Vision Foundation Models
Beomseok Kang
Niluthpol Chowdhury Mithun
Mikhail Sizintsev
Han-Pang Chiu
S. Samarasekera
93
0
0
28 Sep 2025
Open-Vocabulary Spatio-Temporal Scene Graph for Robot Perception and Teleoperation Planning
Y Samuel Wang
Zeyu Xue
Mujie Liu
Tongqin Zhang
Yan Hu
Zhou Zhao
Chenguang Yang
Zhenyu Lu
144
0
0
27 Sep 2025
MultiCrafter: High-Fidelity Multi-Subject Generation via Disentangled Attention and Identity-Aware Preference Alignment
Tao Wu
Yibo Jiang
Yehao Lu
Zhizhong Wang
Longxiang Zhang
Zequn Qin
Xi Li
140
0
0
26 Sep 2025
SAGE: Scene Graph-Aware Guidance and Execution for Long-Horizon Manipulation Tasks
Jialiang Li
Wenzheng Wu
Gaojing Zhang
Yifan Han
Wenzhao Lian
LM&Ro
93
0
0
26 Sep 2025
Previous
1
2
3
4
5
...
15
16
17
Next