Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.02643
Cited By
Segment Anything
5 April 2023
A. Kirillov
Eric Mintun
Nikhila Ravi
Hanzi Mao
Chloe Rolland
Laura Gustafson
Tete Xiao
Spencer Whitehead
Alexander C. Berg
Wan-Yen Lo
Piotr Dollár
Ross B. Girshick
MLLM
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Segment Anything"
50 / 4,200 papers shown
Title
On Moving Object Segmentation from Monocular Video with Transformers
Christian Homeyer
Christoph Schnörr
107
3
0
28 Nov 2024
COMPrompter: reconceptualized segment anything model with multiprompt network for camouflaged object detection
Xiaoqin Zhang
Zhenni Yu
Li Zhao
Deng-Ping Fan
Guobao Xiao
VLM
76
0
0
28 Nov 2024
InstanceGaussian: Appearance-Semantic Joint Gaussian Representation for 3D Instance-Level Perception
Haijie Li
Y. Wu
Jiarui Meng
Qiankun Gao
Zhiyao Zhang
Ronggang Wang
Jian Zhang
ISeg
91
2
0
28 Nov 2024
Grid-augmented vision: A simple yet effective approach for enhanced spatial understanding in multi-modal agents
Joongwon Chae
Zhenyu Wang
Peiwu Qin
Dongmei Yu
Peiwu Qin
66
0
0
27 Nov 2024
Enhancing Visual Reasoning with Autonomous Imagination in Multimodal Large Language Models
Jiaheng Liu
Yumeng Li
Boyuan Xiao
Yichang Jian
Ziang Qin
Tianjia Shao
Yao-Xiang Ding
Kun Zhou
MLLM
LRM
105
4
0
27 Nov 2024
ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
Qing Jiang
Gen Luo
Yuqin Yang
Yuda Xiong
Yihao Chen
Zhaoyang Zeng
Tianhe Ren
Lei Zhang
VLM
LRM
109
7
0
27 Nov 2024
Rapid Distributed Fine-tuning of a Segmentation Model Onboard Satellites
Meghan Plumridge
Rasmus Maråk
Chiara Ceccobello
Pablo Gómez
Gabriele Meoni
F. Svoboda
Nicholas D. Lane
68
0
0
26 Nov 2024
HyperSeg: Towards Universal Visual Segmentation with Large Language Model
Cong Wei
Yujie Zhong
Haoxian Tan
Yong Liu
Zheng Zhao
Jie Hu
Yujiu Yang
VOS
MLLM
VLM
LRM
91
1
0
26 Nov 2024
Distractor-free Generalizable 3D Gaussian Splatting
Yanqi Bao
Jing Liao
Jing Huo
Yang Gao
3DGS
97
1
0
26 Nov 2024
A Distractor-Aware Memory for Visual Object Tracking with SAM2
Jovana Videnovic
A. Lukežič
Matej Kristan
VLM
91
2
0
26 Nov 2024
SAM-MPA: Applying SAM to Few-shot Medical Image Segmentation using Mask Propagation and Auto-prompting
Jie Xu
Xiaokang Li
Chengyu Yue
Yuanyuan Wang
Yi Guo
MedIm
86
1
0
26 Nov 2024
HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility Evaluator
Fan Yang
Ru Zhen
Jinqiao Wang
Yanhao Zhang
Haoxiang Chen
Haonan Lu
Sicheng Zhao
Guiguang Ding
83
0
0
26 Nov 2024
Exploring Aleatoric Uncertainty in Object Detection via Vision Foundation Models
Peng Cui
Guande He
Dan Zhang
Zhijie Deng
Yinpeng Dong
Jun Zhu
92
1
0
26 Nov 2024
Symmetry Strikes Back: From Single-Image Symmetry Detection to 3D Generation
Xiang Li
Zixuan Huang
Anh Thai
James M. Rehg
3DGS
87
0
0
26 Nov 2024
ΩSFormer: Dual-Modal Ω-like Super-Resolution Transformer Network for Cross-scale and High-accuracy Terraced Field Vectorization Extraction
Chang Li
Yu Wang
Chuxu Zhang
Yongjun Zhang
61
0
0
26 Nov 2024
OpenAD: Open-World Autonomous Driving Benchmark for 3D Object Detection
Zhongyu Xia
Jishuo Li
Zhiwei Lin
Xinhao Wang
Yansen Wang
Ming-Hsuan Yang
VLM
87
2
0
26 Nov 2024
CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos
Xinhao Liu
Jiajian Li
Yichen Jiang
Niranjan Sujay
Zheng Yang
Juexiao Zhang
John Abanes
Jing Zhang
Chen Feng
116
2
0
26 Nov 2024
GenDeg: Diffusion-based Degradation Synthesis for Generalizable All-In-One Image Restoration
Sudarshan Rajagopalan
Nithin Gopalakrishnan Nair
Jay N. Paranjape
Vishal M. Patel
DiffM
98
0
0
26 Nov 2024
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation
Claudia Cuttano
Gabriele Trivigno
Gabriele Rosi
Carlo Masone
Giuseppe Averta
VOS
112
2
0
26 Nov 2024
Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation
Chanyoung Kim
Dayun Ju
Woojung Han
Ming-Hsuan Yang
Seong Jae Hwang
VLM
VOS
89
0
0
26 Nov 2024
vesselFM: A Foundation Model for Universal 3D Blood Vessel Segmentation
Bastian Wittmann
Yannick Wattenberg
Tamaz Amiranashvili
Suprosanna Shit
Bjoern H. Menze
92
3
0
26 Nov 2024
Probing the Mid-level Vision Capabilities of Self-Supervised Learning
Xuweiyi Chen
Markus Marks
Zezhou Cheng
89
0
0
25 Nov 2024
Open Vocabulary Monocular 3D Object Detection
Jin Yao
Hao Gu
Xuweiyi Chen
Jiayun Wang
Zezhou Cheng
ObjD
VLM
78
3
0
25 Nov 2024
Diffusion Features for Zero-Shot 6DoF Object Pose Estimation
Bernd Von Gimborn
P. Ausserlechner
Markus Vincze
S. Thalhammer
DiffM
76
0
0
25 Nov 2024
Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models
Ronghuan Wu
Wanchao Su
Jing Liao
DiffM
79
1
0
25 Nov 2024
J-CaPA : Joint Channel and Pyramid Attention Improves Medical Image Segmentation
Marzia Binta Nizam
Marian Zlateva
James Davis
MedIm
69
0
0
25 Nov 2024
One Diffusion to Generate Them All
Duong H. Le
Tuan Pham
Sangho Lee
Christopher Clark
Aniruddha Kembhavi
Stephan Mandt
Ranjay Krishna
Jiasen Lu
VLM
84
5
0
25 Nov 2024
Phase-Informed Tool Segmentation for Manual Small-Incision Cataract Surgery
Bhuvan Sachdeva
Naren Akash
Tajamul Ashraf
Simon Muller
T. Schultz
M. Wintergerst
Niharika Singri Prasad
K. Murali
Mohit Jain
86
0
0
25 Nov 2024
Any3DIS: Class-Agnostic 3D Instance Segmentation by 2D Mask Tracking
P. Nguyen
Minh Luu
Anh Tran
Cuong Pham
K. Nguyen
3DPC
87
0
0
25 Nov 2024
Med-PerSAM: One-Shot Visual Prompt Tuning for Personalized Segment Anything Model in Medical Domain
Hangyul Yoon
Doohyuk Jang
JungEun Kim
Eunho Yang
VLM
MedIm
77
1
0
25 Nov 2024
Boosting 3D Object Generation through PBR Materials
Yitong Wang
Xudong Xu
Li Ma
Haozhao Wang
Bo Dai
83
3
0
25 Nov 2024
Language Driven Occupancy Prediction
Zhu Yu
Bowen Pang
Lizhe Liu
Runmin Zhang
Qihao Peng
Maochun Luo
Sheng Yang
Mingxia Chen
Si-Yuan Cao
Hui-Liang Shen
97
2
0
25 Nov 2024
ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
Haozhan Shen
Kangjia Zhao
Tiancheng Zhao
Ruochen Xu
Zilun Zhang
Mingwei Zhu
Jianwei Yin
97
4
0
25 Nov 2024
Phys4DGen: Physics-Compliant 4D Generation with Multi-Material Composition Perception
Jiajing Lin
Zhenzhong Wang
Shu Jiang
Yongjie Hou
Min Jiang
Min Jiang
VGen
79
0
0
25 Nov 2024
VideoOrion: Tokenizing Object Dynamics in Videos
Yicheng Feng
Yijiang Li
Wanpeng Zhang
Sipeng Zheng
Zongqing Lu
Sipeng Zheng
Zongqing Lu
109
1
0
25 Nov 2024
UNOPose: Unseen Object Pose Estimation with an Unposed RGB-D Reference Image
Xingyu Liu
Gu Wang
Ruida Zhang
Chenyangguang Zhang
F. Tombari
Xiangyang Ji
290
2
0
25 Nov 2024
ResCLIP: Residual Attention for Training-free Dense Vision-language Inference
Yuhang Yang
Jinhong Deng
Wen Li
Lixin Duan
VLM
83
0
0
24 Nov 2024
Bundle Adjusted Gaussian Avatars Deblurring
Muyao Niu
Yifan Zhan
Qingtian Zhu
Zechao Li
Wei Wang
Zhihang Zhong
Xingchen Sun
Yinqiang Zheng
3DGS
88
0
0
24 Nov 2024
OccludeNet: A Causal Journey into Mixed-View Actor-Centric Video Action Recognition under Occlusions
Guanyu Zhou
Xiaohan Yu
Wenxin Huang
Xuemei Jia
Xian Zhong
Chia-Wen Lin
CML
81
0
0
24 Nov 2024
ROOT: VLM based System for Indoor Scene Understanding and Beyond
Yonghui Wang
Shi-Yong Chen
Zhenxing Zhou
Siyi Li
Haoran Li
Wengang Zhou
Haoyang Li
VLM
72
3
0
24 Nov 2024
AnySynth: Harnessing the Power of Image Synthetic Data Generation for Generalized Vision-Language Tasks
Yuchen Li
Fan Ma
Yi Yang
DiffM
154
2
0
24 Nov 2024
AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea
Qifan Yu
Wei Chow
Zhongqi Yue
Kaihang Pan
Yang Wu
Xiaoyang Wan
Juncheng Billy Li
Siliang Tang
Hao Zhang
Yueting Zhuang
DiffM
112
17
0
24 Nov 2024
Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation
Sule Bai
Yong-Jin Liu
Yifei Han
Haoji Zhang
Yansong Tang
VLM
87
3
0
24 Nov 2024
Training an Open-Vocabulary Monocular 3D Object Detection Model without 3D Data
Rui Huang
Henry Zheng
Yan Wang
Zhuofan Xia
Marco Pavone
Gao Huang
3DPC
VLM
96
1
0
23 Nov 2024
Fine-Grained Open-Vocabulary Object Recognition via User-Guided Segmentation
Jinwoo Ahn
Hyeokjoon Kwon
Hwiyeon Yoo
ObjD
VLM
82
0
0
23 Nov 2024
CellPilot
Philipp Endres
Valentin Koch
Julia A. Schnabel
Carsten Marr
VLM
MedIm
76
0
0
23 Nov 2024
Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator
Chaehun Shin
Jooyoung Choi
Heeseung Kim
Sungroh Yoon
DiffM
94
8
0
23 Nov 2024
FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity
Hang Hua
Qing Liu
Lingzhi Zhang
Jing Shi
Zhifei Zhang
Yilin Wang
Jianming Zhang
Jiebo Luo
CoGe
VLM
103
6
0
23 Nov 2024
There is no SAMantics! Exploring SAM as a Backbone for Visual Understanding Tasks
Miguel Espinosa
Chenhongyi Yang
Linus Ericsson
Jingyu Sun
Elliot J. Crowley
VLM
80
0
0
22 Nov 2024
Benchmarking the Robustness of Optical Flow Estimation to Corruptions
Zhonghua Yi
Hao-miao Shi
Zhijie Xu
Yao Gao
Ze Wang
Yanmei Zhang
Kailun Yang
Kaiwei Wang
AAML
87
1
0
22 Nov 2024
Previous
1
2
3
...
20
21
22
...
82
83
84
Next