ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.00714
  4. Cited By
SAM 2: Segment Anything in Images and Videos

SAM 2: Segment Anything in Images and Videos

International Conference on Learning Representations (ICLR), 2024
1 August 2024
Nikhila Ravi
Valentin Gabeur
Yuan-Ting Hu
Ronghang Hu
Chaitanya K. Ryali
Tengyu Ma
Haitham Khedr
Roman Rädle
Chloe Rolland
Laura Gustafson
Eric Mintun
Junting Pan
Kalyan Vasudev Alwala
Nicolas Carion
Chao-Yuan Wu
Ross B. Girshick
Piotr Dollár
Christoph Feichtenhofer
    VLMMLLM
ArXiv (abs)PDFHTMLHuggingFace (116 upvotes)

Papers citing "SAM 2: Segment Anything in Images and Videos"

50 / 863 papers shown
Semantic Exploration and Dense Mapping of Complex Environments using Ground Robot with Panoramic LiDAR-Camera Fusion
Semantic Exploration and Dense Mapping of Complex Environments using Ground Robot with Panoramic LiDAR-Camera FusionIEEE Robotics and Automation Letters (IEEE RA-L), 2025
Xiaoyang Zhan
Shixin Zhou
Qianqian Yang
Yixuan Zhao
Hao Liu
Srinivas Chowdary Ramineni
K. Shimada
241
0
0
28 May 2025
InfoSAM: Fine-Tuning the Segment Anything Model from An Information-Theoretic Perspective
InfoSAM: Fine-Tuning the Segment Anything Model from An Information-Theoretic Perspective
Yuanhong Zhang
Muyao Yuan
Weizhan Zhang
Tieliang Gong
Wen Wen
Jiangyong Ying
Weijie Shi
VLM
237
0
0
28 May 2025
DexUMI: Using Human Hand as the Universal Manipulation Interface for Dexterous Manipulation
DexUMI: Using Human Hand as the Universal Manipulation Interface for Dexterous Manipulation
Mengda Xu
Han Zhang
Yifan Hou
Zhenjia Xu
Linxi Fan
Manuela Veloso
Shuran Song
371
23
0
28 May 2025
SAM-R1: Leveraging SAM for Reward Feedback in Multimodal Segmentation via Reinforcement Learning
SAM-R1: Leveraging SAM for Reward Feedback in Multimodal Segmentation via Reinforcement Learning
Jiaqi Huang
Zunnan Xu
Jun Zhou
Ting Liu
Yicheng Xiao
Mingwen Ou
Bowen Ji
Xiu Li
Kehong Yuan
VLM
224
8
0
28 May 2025
Geometric Feature Prompting of Image Segmentation Models
Geometric Feature Prompting of Image Segmentation Models
Kenneth Ball
Erin Taylor
Nirav Patel
Andrew Bartels
Gary Koplik
James Polly
Jay Hineman
VLM
131
0
0
27 May 2025
SANSA: Unleashing the Hidden Semantics in SAM2 for Few-Shot Segmentation
SANSA: Unleashing the Hidden Semantics in SAM2 for Few-Shot Segmentation
Claudia Cuttano
Gabriele Trivigno
Giuseppe Averta
Carlo Masone
VLM
258
0
0
27 May 2025
PartInstruct: Part-level Instruction Following for Fine-grained Robot Manipulation
PartInstruct: Part-level Instruction Following for Fine-grained Robot ManipulationRobotics (RAS), 2025
Yifan Yin
Zhengtao Han
Shivam Aarya
Jianxin Wang
Shuhang Xu
Jiawei Peng
Angtian Wang
Alan Yuille
Tianmin Shu
LM&Ro
287
2
0
27 May 2025
OccLE: Label-Efficient 3D Semantic Occupancy Prediction
OccLE: Label-Efficient 3D Semantic Occupancy Prediction
N. Fang
Zheyuan Zhou
Fayao Liu
Xulei Yang
Jiacheng Wei
Lemiao Qiu
Guosheng Lin
Guosheng Lin
3DPC
572
0
0
27 May 2025
Frame In-N-Out: Unbounded Controllable Image-to-Video Generation
Frame In-N-Out: Unbounded Controllable Image-to-Video Generation
Boyang Wang
Xuweiyi Chen
Matheus Gadelha
Zezhou Cheng
DiffMVGen
387
5
0
27 May 2025
Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO
Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO
Huanyi Zheng
Hao Zhong
Canyu Zhao
Zongze Du
Zheng Huang
...
Hao Chen
Cheng Zou
Jingdong Chen
Ming-Hsuan Yang
Chunhua Shen
LRM
361
21
0
27 May 2025
AniCrafter: Customizing Realistic Human-Centric Animation via Avatar-Background Conditioning in Video Diffusion Models
AniCrafter: Customizing Realistic Human-Centric Animation via Avatar-Background Conditioning in Video Diffusion Models
Muyao Niu
Mingdeng Cao
Yifan Zhan
Qingtian Zhu
Mingze Ma
Jiancheng Zhao
Yanhong Zeng
Zhihang Zhong
Xiao Sun
Yinqiang Zheng
DiffMVGen
327
7
0
26 May 2025
CCL-LGS: Contrastive Codebook Learning for 3D Language Gaussian Splatting
CCL-LGS: Contrastive Codebook Learning for 3D Language Gaussian Splatting
Lei Tian
Xiaomin Li
Liqian Ma
Hefei Huang
Zirui Zheng
Hao Yin
Taiqing Li
Huchuan Lu
Xu Jia
349
2
0
26 May 2025
CoT-RVS: Zero-Shot Chain-of-Thought Reasoning Segmentation for Videos
CoT-RVS: Zero-Shot Chain-of-Thought Reasoning Segmentation for Videos
Shiu-hong Kao
Yu-Wing Tai
Chi-Keung Tang
VOSMLLMVGenLRM
341
0
0
24 May 2025
Grounding Bodily Awareness in Visual Representations for Efficient Policy Learning
Grounding Bodily Awareness in Visual Representations for Efficient Policy Learning
Junlin Wang
Zhiyun Lin
1.5K
0
0
24 May 2025
So-Fake: Benchmarking and Explaining Social Media Image Forgery Detection
So-Fake: Benchmarking and Explaining Social Media Image Forgery Detection
Zhenglin Huang
Tianxiao Li
Xiangtai Li
Haiquan Wen
Yiwei He
...
Hao Fei
Xi Yang
Xiaowei Huang
Bei Peng
Guangliang Cheng
711
6
0
24 May 2025
Instruct2See: Learning to Remove Any Obstructions Across Distributions
Instruct2See: Learning to Remove Any Obstructions Across Distributions
Junhang Li
Yu Guo
Chuhua Xian
Shengfeng He
314
1
0
23 May 2025
Weakly-supervised Mamba-Based Mastoidectomy Shape Prediction for Cochlear Implant Surgery Using 3D T-Distribution Loss
Weakly-supervised Mamba-Based Mastoidectomy Shape Prediction for Cochlear Implant Surgery Using 3D T-Distribution Loss
Yike Zhang
Jack H. Noble
371
0
0
23 May 2025
Track Anything Annotate: Video annotation and dataset generation of computer vision models
Nikita Ivanov
Mark Klimov
Dmitry Glukhikh
Tatiana Chernysheva
Igor Glukhikh
VGen
161
0
0
23 May 2025
REN: Fast and Efficient Region Encodings from Patch-Based Image Encoders
REN: Fast and Efficient Region Encodings from Patch-Based Image Encoders
Savya Khosla
Sethuraman TV
Barnett Lee
Alexander Schwing
Derek Hoiem
VGen
403
0
0
23 May 2025
ComfyMind: Toward General-Purpose Generation via Tree-Based Planning and Reactive Feedback
ComfyMind: Toward General-Purpose Generation via Tree-Based Planning and Reactive Feedback
Litao Guo
Xinli Xu
Luozhou Wang
Jiantao Lin
Jinsong Zhou
Zixin Zhang
Bolan Su
Ying-Cong Chen
LLMAGLRM
224
6
0
23 May 2025
Towards Dynamic 3D Reconstruction of Hand-Instrument Interaction in Ophthalmic Surgery
Towards Dynamic 3D Reconstruction of Hand-Instrument Interaction in Ophthalmic Surgery
Ming Hu
Zhendi Yu
Feilong Tang
Kaiwen Chen
Yulong Li
Imran Razzak
Junjun He
Tolga Birdal
Kaijing Zhou
Zongyuan Ge
308
0
0
23 May 2025
H2-COMPACT: Human-Humanoid Co-Manipulation via Adaptive Contact Trajectory Policies
H2-COMPACT: Human-Humanoid Co-Manipulation via Adaptive Contact Trajectory Policies
Geeta Chandra Raju Bethala
Niraj Pudasaini
Niraj Pudasaini
Abdullah Mohamed Ali
Shuaihang Yuan
Congcong Wen
Anthony Tzes
Yi Fang
297
3
0
23 May 2025
Auto-nnU-Net: Towards Automated Medical Image Segmentation
Auto-nnU-Net: Towards Automated Medical Image Segmentation
Jannis Becktepe
Leona Hennig
Steffen Oeltze-Jafra
Marius Lindauer
549
3
0
22 May 2025
gen2seg: Generative Models Enable Generalizable Instance Segmentation
gen2seg: Generative Models Enable Generalizable Instance Segmentation
Om Khangaonkar
Hamed Pirsiavash
DiffMVLM
456
0
0
21 May 2025
From Grounding to Manipulation: Case Studies of Foundation Model Integration in Embodied Robotic Systems
From Grounding to Manipulation: Case Studies of Foundation Model Integration in Embodied Robotic SystemsConference on Empirical Methods in Natural Language Processing (EMNLP), 2025
Xiuchao Sui
Daiying Tian
Qi Sun
Ruirui Chen
Dongkyu Choi
Kenneth Kwok
Soujanya Poria
LM&Ro
577
2
0
21 May 2025
Advancing Marine Research: UWSAM Framework and UIIS10K Dataset for Precise Underwater Instance Segmentation
Advancing Marine Research: UWSAM Framework and UIIS10K Dataset for Precise Underwater Instance Segmentation
Hua Li
Shijie Lian
Zhiyuan Li
Runmin Cong
Sam Kwong
Laurence Tianruo Yang
Weidong Zhang
Sam Kwong
VLM
454
1
0
21 May 2025
TAGS: 3D Tumor-Adaptive Guidance for SAM
TAGS: 3D Tumor-Adaptive Guidance for SAM
Sirui Li
Linkai Peng
Zheyuan Zhang
Gorkem Durak
Ulas Bagci
MedImVLM
466
0
0
21 May 2025
Scaling Vision Mamba Across Resolutions via Fractal Traversal
Scaling Vision Mamba Across Resolutions via Fractal Traversal
Bo Li
Haoke Xiao
Lv Tang
Mamba
389
1
0
20 May 2025
Unlocking the Power of SAM 2 for Few-Shot Segmentation
Unlocking the Power of SAM 2 for Few-Shot Segmentation
Qianxiong Xu
Lanyun Zhu
Xuanyi Liu
Guosheng Lin
Cheng Long
Ziyue Li
Rui Zhao
VLM
282
3
0
20 May 2025
GraspMolmo: Generalizable Task-Oriented Grasping via Large-Scale Synthetic Data Generation
GraspMolmo: Generalizable Task-Oriented Grasping via Large-Scale Synthetic Data Generation
Abhay Deshpande
Yuquan Deng
Arijit Ray
Jordi Salvador
Winson Han
Jiafei Duan
Kuo-Hao Zeng
Yuke Zhu
Ranjay Krishna
Rose Hendrix
457
6
0
19 May 2025
Improving Compositional Generation with Diffusion Models Using Lift Scores
Improving Compositional Generation with Diffusion Models Using Lift Scores
Chenning Yu
Sicun Gao
DiffM
1.2K
1
0
19 May 2025
3D Visual Illusion Depth Estimation
3D Visual Illusion Depth Estimation
Chengtang Yao
Zhidan Liu
Jiaxi Zeng
Lidong Yu
Yuwei Wu
Yunde Jia
MDE
682
1
0
19 May 2025
VisionReasoner: Unified Reasoning-Integrated Visual Perception via Reinforcement Learning
VisionReasoner: Unified Reasoning-Integrated Visual Perception via Reinforcement Learning
Yuqi Liu
Tianyuan Qu
Zhisheng Zhong
Bohao Peng
Shu Liu
Bei Yu
Jiaya Jia
VLMLRM
491
5
0
17 May 2025
MTevent: A Multi-Task Event Camera Dataset for 6D Pose Estimation and Moving Object Detection
MTevent: A Multi-Task Event Camera Dataset for 6D Pose Estimation and Moving Object Detection
Shrutarv Awasthi
Anas Gouda
Sven Franke
Jérôme Rutinowski
Frank Hoffmann
Moritz Roidl
269
2
0
16 May 2025
Visual Planning: Let's Think Only with Images
Visual Planning: Let's Think Only with Images
Yi Xu
Chengzu Li
Han Zhou
Xingchen Wan
Caiqi Zhang
Anna Korhonen
Ivan Vulić
LM&RoLRM
457
35
0
16 May 2025
ManipBench: Benchmarking Vision-Language Models for Low-Level Robot Manipulation
ManipBench: Benchmarking Vision-Language Models for Low-Level Robot Manipulation
Enyu Zhao
Vedant Raval
Hejia Zhang
Jiageng Mao
Zeyu Shangguan
Stefanos Nikolaidis
Yun Wang
Daniel Seita
LM&RoCoGe
371
13
0
14 May 2025
Air-Ground Collaboration for Language-Specified Missions in Unknown Environments
Air-Ground Collaboration for Language-Specified Missions in Unknown Environments
Fernando Cladera
Zachary Ravichandran
Jason Hughes
Varun Murali
Carlos Nieto-Granda
M. Hsieh
George J. Pappas
Camillo J Taylor
Vijay Kumar
325
6
0
14 May 2025
Augmented Reality for RObots (ARRO): Pointing Visuomotor Policies Towards Visual Robustness
Augmented Reality for RObots (ARRO): Pointing Visuomotor Policies Towards Visual Robustness
Reihaneh Mirjalili
Tobias Jülg
Florian Walter
Wolfram Burgard
413
5
0
13 May 2025
ReSurgSAM2: Referring Segment Anything in Surgical Video via Credible Long-term Tracking
ReSurgSAM2: Referring Segment Anything in Surgical Video via Credible Long-term TrackingInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025
Haofeng Liu
Mingqi Gao
Xuxiao Luo
Ziyue Wang
Guanyi Qin
Jinlin Wu
Yueming Jin
262
10
0
13 May 2025
Extracting Visual Plans from Unlabeled Videos via Symbolic Guidance
Extracting Visual Plans from Unlabeled Videos via Symbolic Guidance
Wenyan Yang
Ahmet Tikna
Yi Zhao
Yuying Zhang
Luigi Palopoli
Marco Roveri
Joni Pajarinen
VGen
329
1
0
13 May 2025
When Dance Video Archives Challenge Computer Vision
When Dance Video Archives Challenge Computer Vision
Philippe Colantoni
Rafique Ahmed
Prashant Ghimire
Damien Muselet
A. Trémeau
3DH
159
0
0
12 May 2025
ABS-Mamba: SAM2-Driven Bidirectional Spiral Mamba Network for Medical Image Translation
ABS-Mamba: SAM2-Driven Bidirectional Spiral Mamba Network for Medical Image Translation
Feng Yuan
Yifan Gao
Wenbin Wu
Keqing Wu
Xiaotong Guo
Jie Jiang
Xin Gao
Mamba
267
2
0
12 May 2025
The First WARA Robotics Mobile Manipulation Challenge -- Lessons Learned
The First WARA Robotics Mobile Manipulation Challenge -- Lessons LearnedEuropean Conference on Mobile Robots (ECMR), 2025
David Cáceres-Domínguez
M. Iannotta
Abhishek Kashyap
Shuo Sun
Yuxuan Yang
...
Zheng Jia
Graziano Carriero
Sofia Lindqvist
Silvio Di Castro
Matteo Iovino
235
0
0
11 May 2025
Morphologically Symmetric Reinforcement Learning for Ambidextrous Bimanual Manipulation
Morphologically Symmetric Reinforcement Learning for Ambidextrous Bimanual Manipulation
Zechu Li
Yufeng Jin
Daniel Felipe Ordoñez Apraez
Claudio Semini
Puze Liu
Georgia Chalvatzaki
995
0
0
08 May 2025
D-CODA: Diffusion for Coordinated Dual-Arm Data Augmentation
D-CODA: Diffusion for Coordinated Dual-Arm Data Augmentation
Isabella Liu
Jason Chen
Gaurav Sukhatme
Daniel Seita
393
2
0
08 May 2025
UncertainSAM: Fast and Efficient Uncertainty Quantification of the Segment Anything Model
UncertainSAM: Fast and Efficient Uncertainty Quantification of the Segment Anything Model
Timo Kaiser
Thomas Norrenbrock
Bodo Rosenhahn
632
4
0
08 May 2025
DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception
DeCLIP: Decoupled Learning for Open-Vocabulary Dense PerceptionComputer Vision and Pattern Recognition (CVPR), 2025
Junjie Wang
Bin Chen
Yulin Li
Bin Kang
Yulin Chen
Zhuotao Tian
VLM
309
7
0
07 May 2025
RAVU: Retrieval Augmented Video Understanding with Compositional Reasoning over Graph
RAVU: Retrieval Augmented Video Understanding with Compositional Reasoning over Graph
Sameer Malik
Moyuru Yamada
Ayush Singh
Dishank Aggarwal
1.0K
1
0
06 May 2025
DyGEnc: Encoding a Sequence of Textual Scene Graphs to Reason and Answer Questions in Dynamic Scenes
DyGEnc: Encoding a Sequence of Textual Scene Graphs to Reason and Answer Questions in Dynamic Scenes
S. Linok
Vadim Semenov
Anastasia Trunova
Oleg Bulichev
Dmitry A. Yudin
325
1
0
06 May 2025
Show or Tell? A Benchmark To Evaluate Visual and Textual Prompts in Semantic Segmentation
Show or Tell? A Benchmark To Evaluate Visual and Textual Prompts in Semantic Segmentation
Gabriele Rosi
Fabio Cermelli
VLM
472
0
0
06 May 2025
Previous
123...101112...161718
Next
Page 11 of 18
Pageof 18