ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.00714
  4. Cited By
SAM 2: Segment Anything in Images and Videos

SAM 2: Segment Anything in Images and Videos

International Conference on Learning Representations (ICLR), 2024
1 August 2024
Nikhila Ravi
Valentin Gabeur
Yuan-Ting Hu
Ronghang Hu
Chaitanya K. Ryali
Tengyu Ma
Haitham Khedr
Roman Rädle
Chloe Rolland
Laura Gustafson
Eric Mintun
Junting Pan
Kalyan Vasudev Alwala
Nicolas Carion
Chao-Yuan Wu
Ross B. Girshick
Piotr Dollár
Christoph Feichtenhofer
    VLMMLLM
ArXiv (abs)PDFHTMLHuggingFace (116 upvotes)

Papers citing "SAM 2: Segment Anything in Images and Videos"

50 / 863 papers shown
Visual Imitation Enables Contextual Humanoid Control
Visual Imitation Enables Contextual Humanoid Control
Arthur Allshire
Hongsuk Choi
Junyi Zhang
David McAllister
Anthony Zhang
Chung Min Kim
Trevor Darrell
Pieter Abbeel
Jitendra Malik
Angjoo Kanazawa
LM&Ro
1.3K
35
0
06 May 2025
6D Pose Estimation on Spoons and Hands
6D Pose Estimation on Spoons and Hands
Kevin Tan
Fan Yang
Yuxiao Chen
247
0
0
05 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Wei Wei
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
...
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
1.2K
32
0
05 May 2025
Interleave-VLA: Enhancing Robot Manipulation with Interleaved Image-Text Instructions
Interleave-VLA: Enhancing Robot Manipulation with Interleaved Image-Text Instructions
Cunxin Fan
Xiaosong Jia
Yihang Sun
Yixiao Wang
Jianglan Wei
...
Xiangyu Zhao
Masayoshi Tomizuka
Songyuan Li
Junchi Yan
Mingyu Ding
LM&RoVLM
379
25
0
04 May 2025
Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation
Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation
Volodymyr Havrylov
Haiwen Huang
Dan Zhang
Andreas Geiger
1.1K
1
0
04 May 2025
SignSplat: Rendering Sign Language via Gaussian Splatting
SignSplat: Rendering Sign Language via Gaussian Splatting
Maksym Ivashechkin
Oscar Mendez
Richard Bowden
3DGS
368
1
0
04 May 2025
Prompt-responsive Object Retrieval with Memory-augmented Student-Teacher Learning
Prompt-responsive Object Retrieval with Memory-augmented Student-Teacher LearningIEEE International Conference on Robotics and Automation (ICRA), 2025
Malte Mosbach
Sven Behnke
197
0
0
04 May 2025
Segment Any RGB-Thermal Model with Language-aided Distillation
Segment Any RGB-Thermal Model with Language-aided Distillation
Dong Xing
Xianxun Zhu
Wei Zhou
Qika Lin
Hang Yang
Yuqing Wang
VLM
475
0
0
04 May 2025
Accelerating Volumetric Medical Image Annotation via Short-Long Memory SAM 2
Accelerating Volumetric Medical Image Annotation via Short-Long Memory SAM 2IEEE Transactions on Medical Imaging (IEEE TMI), 2025
Yuwen Chen
Zafer Yildiz
Qihang Li
Yaqian Chen
Haoyu Dong
Hanxue Gu
Nicholas Konz
Maciej A. Mazurowski
MedImVLM
485
1
0
03 May 2025
RESAnything: Attribute Prompting for Arbitrary Referring Segmentation
RESAnything: Attribute Prompting for Arbitrary Referring Segmentation
Ruiqi Wang
Hao Zhang
VLM
282
2
0
03 May 2025
Can Foundation Models Really Segment Tumors? A Benchmarking Odyssey in Lung CT Imaging
Can Foundation Models Really Segment Tumors? A Benchmarking Odyssey in Lung CT Imaging
Elena Mulero Ayllón
Massimiliano Mantegna
Linlin Shen
Paolo Soda
V. Guarrasi
M. Tortora
261
4
0
02 May 2025
Improving Editability in Image Generation with Layer-wise Memory
Improving Editability in Image Generation with Layer-wise MemoryComputer Vision and Pattern Recognition (CVPR), 2025
Daneul Kim
Jaeah Lee
Jaesik Park
DiffMKELM
299
1
0
02 May 2025
Zoomer: Adaptive Image Focus Optimization for Black-box MLLM
Zoomer: Adaptive Image Focus Optimization for Black-box MLLM
Jiaxu Qian
Chendong Wang
Yue Yang
Chaoyun Zhang
Huiqiang Jiang
...
Saravan Rajmohan
Dongmei Zhang
Yifan Yang
Tao Gui
Lili Qiu
VLM
390
1
0
30 Apr 2025
UniBiomed: A Universal Foundation Model for Grounded Biomedical Image Interpretation
UniBiomed: A Universal Foundation Model for Grounded Biomedical Image Interpretation
Linshan Wu
Yuxiang Nie
Sunan He
Jiaxin Zhuang
Hao Chen
...
Hao Chen
Ronald Cheong Kin Chan
Yifan Peng
Pranav Rajpurkar
Hao Chen
LM&MAMedIm
665
5
0
30 Apr 2025
PRISM-DP: Spatial Pose-based Observations for Diffusion-Policies via Segmentation, Mesh Generation, and Pose Tracking
PRISM-DP: Spatial Pose-based Observations for Diffusion-Policies via Segmentation, Mesh Generation, and Pose Tracking
Xiatao Sun
Yinxing Chen
Daniel Rakita
VGen
397
5
0
29 Apr 2025
Dexonomy: Synthesizing All Dexterous Grasp Types in a Grasp Taxonomy
Dexonomy: Synthesizing All Dexterous Grasp Types in a Grasp Taxonomy
Jiayi Chen
Yubin Ke
Lin Peng
He Wang
307
13
0
26 Apr 2025
SORT3D: Spatial Object-centric Reasoning Toolbox for Zero-Shot 3D Grounding Using Large Language Models
SORT3D: Spatial Object-centric Reasoning Toolbox for Zero-Shot 3D Grounding Using Large Language Models
Nader Zantout
Haochen Zhang
Pujith Kachana
J. Qiu
Ji Zhang
Ji Zhang
Wenshan Wang
LM&RoLRM
797
6
0
25 Apr 2025
Step1X-Edit: A Practical Framework for General Image Editing
Step1X-Edit: A Practical Framework for General Image Editing
Shixuan Liu
Yucheng Han
Peng Xing
Fukun Yin
Rui Wang
...
Yibo Zhu
Binxing Jiao
Wei Wei
Gang Yu
Daxin Jiang
DiffM
762
174
0
24 Apr 2025
PIN-WM: Learning Physics-INformed World Models for Non-Prehensile Manipulation
PIN-WM: Learning Physics-INformed World Models for Non-Prehensile Manipulation
Wenxuan Li
Hang Zhao
Zhiyuan Yu
Yu Du
Qin Zou
Ruizhen Hu
K. Xu
SSL
419
9
0
23 Apr 2025
Physically Consistent Humanoid Loco-Manipulation using Latent Diffusion Models
Physically Consistent Humanoid Loco-Manipulation using Latent Diffusion Models
Ilyass Taouil
Haizhou Zhao
Angela Dai
Majid Khadiv
DiffM
271
1
0
23 Apr 2025
AffordanceSAM: Segment Anything Once More in Affordance Grounding
AffordanceSAM: Segment Anything Once More in Affordance Grounding
Dengyang Jiang
Zanyi Wang
Teli Ma
Haoyang Li
Wenshu Fan
Guang Dai
Guang Dai
Lei Zhang
Mengmeng Wang
307
3
0
22 Apr 2025
Model-based Metric 3D Shape and Motion Reconstruction of Wild Bottlenose Dolphins in Drone-Shot Videos
Model-based Metric 3D Shape and Motion Reconstruction of Wild Bottlenose Dolphins in Drone-Shot Videos
Daniele Baieri
Riccardo Cicciarella
Michael Krützen
Emanuele Rodolà
Silvia Zuffi
405
2
0
22 Apr 2025
LSP-ST: Ladder Shape-Biased Side-Tuning for Robust Infrared Small Target Detection
LSP-ST: Ladder Shape-Biased Side-Tuning for Robust Infrared Small Target Detection
Guoyi Zhang
Siyang Chen
Guangsheng Xu
Han Wang
Donghe Wang
Xiaohu Zhang
304
2
0
20 Apr 2025
Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D
Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D
Sergio Arnaud
Paul Mcvay
Ada Martin
Arjun Majumdar
Krishna Murthy Jatavallabhula
...
Nicolas Ballas
Mido Assran
Oleksandr Maksymets
Aravind Rajeswaran
Franziska Meier
3DPC
286
15
0
19 Apr 2025
HSACNet: Hierarchical Scale-Aware Consistency Regularized Semi-Supervised Change Detection
HSACNet: Hierarchical Scale-Aware Consistency Regularized Semi-Supervised Change Detection
Qiáo Xu
Pengfei Wang
Yanjun Li
Tianwen Qian
Xiaoling Wang
182
0
0
18 Apr 2025
Crossing the Human-Robot Embodiment Gap with Sim-to-Real RL using One Human Demonstration
Crossing the Human-Robot Embodiment Gap with Sim-to-Real RL using One Human Demonstration
Tyler Ga Wei Lum
Olivia Y. Lee
C. Karen Liu
Jeannette Bohg
410
16
0
17 Apr 2025
Perception Encoder: The best visual embeddings are not at the output of the network
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya
Po-Yao (Bernie) Huang
Peize Sun
Jang Hyun Cho
Andrea Madotto
...
Shiyu Dong
Nikhila Ravi
Daniel Li
Piotr Dollár
Christoph Feichtenhofer
ObjDVOS
675
118
0
17 Apr 2025
A0: An Affordance-Aware Hierarchical Model for General Robotic Manipulation
A0: An Affordance-Aware Hierarchical Model for General Robotic Manipulation
Rongtao Xu
Junxuan Zhang
Minghao Guo
Youpeng Wen
H. Yang
...
Liqiong Wang
Yuxuan Kuang
Meng Cao
Feng Zheng
Xiaodan Liang
631
31
0
17 Apr 2025
Zooming In on Fakes: A Novel Dataset for Localized AI-Generated Image Detection with Forgery Amplification Approach
Zooming In on Fakes: A Novel Dataset for Localized AI-Generated Image Detection with Forgery Amplification Approach
Lvpan Cai
Haowei Wang
Jinfa Huang
YanShu ZhouMen
Yiwei Ma
Xiaoshuai Sun
Liujuan Cao
ViT
397
5
0
16 Apr 2025
AnomalyR1: A GRPO-based End-to-end MLLM for Industrial Anomaly Detection
AnomalyR1: A GRPO-based End-to-end MLLM for Industrial Anomaly Detection
Yuhao Chao
Jie Liu
J. Tang
Gangshan Wu
363
5
0
16 Apr 2025
How Do I Do That? Synthesizing 3D Hand Motion and Contacts for Everyday Interactions
How Do I Do That? Synthesizing 3D Hand Motion and Contacts for Everyday InteractionsComputer Vision and Pattern Recognition (CVPR), 2025
Aditya Prakash
Benjamin Lundell
Dmitry Andreychuk
David Forsyth
Saurabh Gupta
H. Sawhney
352
6
0
16 Apr 2025
ZeroGrasp: Zero-Shot Shape Reconstruction Enabled Robotic Grasping
ZeroGrasp: Zero-Shot Shape Reconstruction Enabled Robotic GraspingComputer Vision and Pattern Recognition (CVPR), 2025
Shun Iwase
Zubair Irshad
Katherine Liu
Vitor Campagnolo Guizilini
Robert Lee
...
Ayako Amma
Koichi Nishiwaki
Kris Kitani
Rares Andrei Ambrus
Sergey Zakharov
347
5
0
15 Apr 2025
PVUW 2025 Challenge Report: Advances in Pixel-level Understanding of Complex Videos in the Wild
PVUW 2025 Challenge Report: Advances in Pixel-level Understanding of Complex Videos in the Wild
Henghui Ding
Chang Liu
Nikhila Ravi
Shuting He
Y. Wei
...
Haobo Yuan
Xuelong Li
Tao Zhang
Lu Qi
Ming-Hsuan Yang
344
4
0
15 Apr 2025
OmniVDiff: Omni Controllable Video Diffusion for Generation and Understanding
OmniVDiff: Omni Controllable Video Diffusion for Generation and Understanding
Dianbing Xi
Jiadong Wang
Yuanzhi Liang
Xi Qiu
Yuchi Huo
Ruiqi Wang
Fangqiu Yi
Xuzhao Li
DiffMVGen
605
12
0
15 Apr 2025
CAP-Net: A Unified Network for 6D Pose and Size Estimation of Categorical Articulated Parts from a Single RGB-D Image
CAP-Net: A Unified Network for 6D Pose and Size Estimation of Categorical Articulated Parts from a Single RGB-D ImageComputer Vision and Pattern Recognition (CVPR), 2025
Jingshun Huang
Haitao Lin
Tianyu Wang
Yanwei Fu
Xiangyang Xue
Yinlin Zhu
3DPC
446
3
0
15 Apr 2025
Aligning Anime Video Generation with Human Feedback
Aligning Anime Video Generation with Human Feedback
Bingwen Zhu
Yudong Jiang
Baohan Xu
Siqian Yang
Mingyu Yin
Yidi Wu
Huyang Sun
Zuxuan Wu
EGVMVGen
392
5
0
14 Apr 2025
MASSeg : 2nd Technical Report for 4th PVUW MOSE Track
MASSeg : 2nd Technical Report for 4th PVUW MOSE Track
Xuqiang Cao
Linnan Zhao
Jiaxuan Zhao
Fang Liu
Puhua Chen
Wenping Ma
231
0
0
14 Apr 2025
Enhanced Semantic Extraction and Guidance for UGC Image Super Resolution
Enhanced Semantic Extraction and Guidance for UGC Image Super Resolution
Yiwen Wang
Ying Liang
Yuxuan Zhang
Xinning Chai
Zhengxue Cheng
Yingsheng Qin
Yucai Yang
Rong Xie
Li Song
368
3
0
14 Apr 2025
FVOS for MOSE Track of 4th PVUW Challenge: 3rd Place Solution
FVOS for MOSE Track of 4th PVUW Challenge: 3rd Place Solution
Mengjiao Wang
Junpei Zhang
Xu Liu
Yuting Yang
Mengru Ma
VOS
154
0
0
13 Apr 2025
ToolTipNet: A Segmentation-Driven Deep Learning Baseline for Surgical Instrument Tip Detection
ToolTipNet: A Segmentation-Driven Deep Learning Baseline for Surgical Instrument Tip Detection
Zijian Wu
Shuojue Yang
Yueming Jin
Septimiu E. Salcudean
MedIm
349
1
0
13 Apr 2025
PathSeqSAM: Sequential Modeling for Pathology Image Segmentation with SAM2
PathSeqSAM: Sequential Modeling for Pathology Image Segmentation with SAM2
Mingyang Zhu
Yinting Liu
Mingyu Li
Jiacheng Wang
121
0
0
12 Apr 2025
FMLGS: Fast Multilevel Language Embedded Gaussians for Part-level Interactive Agents
FMLGS: Fast Multilevel Language Embedded Gaussians for Part-level Interactive Agents
Xin Tan
Yuzhou Ji
He Zhu
Yuan Xie
3DGS
221
2
0
11 Apr 2025
DSM: Constructing a Diverse Semantic Map for 3D Visual Grounding
DSM: Constructing a Diverse Semantic Map for 3D Visual Grounding
Qinghongbing Xie
Zijian Liang
Fuhao Li
Long Zeng
317
0
0
11 Apr 2025
Parameter-Free Fine-tuning via Redundancy Elimination for Vision Foundation Models
Parameter-Free Fine-tuning via Redundancy Elimination for Vision Foundation Models
Jiahuan Long
Tingsong Jiang
Wen Yao
Yizhe Xiong
Zhengqin Xu
Shuai Jia
Hanqing Liu
Chao Ma
219
0
0
11 Apr 2025
Palmprint De-Identification Using Diffusion Model for High-Quality and Diverse Synthesis
Palmprint De-Identification Using Diffusion Model for High-Quality and Diverse Synthesis
Licheng Yan
Bob Zhang
Andrew Beng Jin Teoh
L. Leng
Shuyi Li
Yuqi Wang
Ziyuan Yang
438
0
0
11 Apr 2025
RealCam-Vid: High-resolution Video Dataset with Dynamic Scenes and Metric-scale Camera Movements
RealCam-Vid: High-resolution Video Dataset with Dynamic Scenes and Metric-scale Camera Movements
Guangcong Zheng
Teng Li
Xianpan Zhou
Xi Li
VGen3DV
245
5
0
11 Apr 2025
CoProSketch: Controllable and Progressive Sketch Generation with Diffusion Model
CoProSketch: Controllable and Progressive Sketch Generation with Diffusion Model
Ruohao Zhan
Yijin Li
Yisheng He
Shuo Chen
Yichen Shen
Xinyu Chen
Zilong Dong
Zhaoyang Huang
Guofeng Zhang
DiffM
311
1
0
11 Apr 2025
DreamFuse: Adaptive Image Fusion with Diffusion Transformer
DreamFuse: Adaptive Image Fusion with Diffusion Transformer
Junjia Huang
Pengxiang Yan
Jiyang Liu
Jie Wu
Zhao Wang
Yitong Wang
Guanbin Li
G. Li
221
5
0
11 Apr 2025
ZS-VCOS: Zero-Shot Video Camouflaged Object Segmentation By Optical Flow and Open Vocabulary Object Detection
ZS-VCOS: Zero-Shot Video Camouflaged Object Segmentation By Optical Flow and Open Vocabulary Object Detection
Wenqi Guo
Mohamed Shehata
Shan Du
VLM
441
0
0
10 Apr 2025
VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning
VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning
Zhong-Yu Li
Ruoyi Du
Juncheng Yan
Le Zhuo
Zhen Li
Peng Gao
Zhanyu Ma
Ming-Ming Cheng
Ming-Ming Cheng
VLM
365
20
0
10 Apr 2025
Previous
123...111213...161718
Next
Page 12 of 18
Pageof 18