Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2101.08833
Cited By
v1
v2 (latest)
SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation
Computer Vision and Pattern Recognition (CVPR), 2021
21 January 2021
Brendan Duke
Abdalla Ahmed
Christian Wolf
P. Aarabi
Graham W. Taylor
VOS
Re-assign community
ArXiv (abs)
PDF
HTML
Github (87★)
Papers citing
"SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation"
50 / 100 papers shown
Segment Anything Across Shots: A Method and Benchmark
Hengrui Hu
Kaining Ying
Henghui Ding
VOS
404
0
0
17 Nov 2025
MOSEv2: A More Challenging Dataset for Video Object Segmentation in Complex Scenes
Henghui Ding
Kaining Ying
Chang-rui Liu
Shuting He
Xudong Jiang
Yu-Gang Jiang
Juil Sock
Song Bai
VOS
416
40
0
07 Aug 2025
Advancing Complex Video Object Segmentation via Progressive Concept Construction
Zhixiong Zhang
Shuangrui Ding
Xiaoyi Dong
Songxin He
Jianfan Lin
Junsong Tang
Yuhang Zang
Yuhang Cao
Dahua Lin
Jiaqi Wang
VOS
305
14
0
21 Jul 2025
OpenAVS: Training-Free Open-Vocabulary Audio Visual Segmentation with Foundational Models
Shengkai Chen
Yifang Yin
Jinming Cao
Shili Xiang
Zhenguang Liu
Roger Zimmermann
VOS
VLM
329
1
0
30 Apr 2025
MoSAM: Motion-Guided Segment Anything Model with Spatial-Temporal Memory Selection
Q. Yang
Xingtai Lv
Miaomiao Cui
Liefeng Bo
VLM
373
4
0
30 Apr 2025
EdgeTAM: On-Device Track Anything Model
Computer Vision and Pattern Recognition (CVPR), 2025
Chong Zhou
Chenchen Zhu
Yunyang Xiong
Saksham Suri
Fanyi Xiao
...
Raghuraman Krishnamoorthi
Bo Dai
Chen Change Loy
Vikas Chandra
Bilge Soran
VLM
369
19
0
13 Jan 2025
Efficient Track Anything
Yunyang Xiong
Chong Zhou
Xiaoyu Xiang
Lemeng Wu
Chenchen Zhu
...
Ramya Akula
Forrest N. Iandola
Raghuraman Krishnamoorthi
Bilge Soran
Vikas Chandra
VLM
VOS
307
18
0
28 Nov 2024
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree
Shuangrui Ding
Rui Qian
Xiaoyi Dong
Pan Zhang
Yuhang Zang
Yuhang Cao
Yuwei Guo
Dahua Lin
Jiaqi Wang
VLM
VOS
434
68
0
21 Oct 2024
ImmersePro: End-to-End Stereo Video Synthesis Via Implicit Disparity Learning
Jian Shi
Zhenyu Li
Peter Wonka
MDE
303
8
0
30 Sep 2024
X-Prompt: Multi-modal Visual Prompt for Video Object Segmentation
ACM Multimedia (MM), 2024
Pinxue Guo
Wanyun Li
Hao Huang
Lingyi Hong
Xinyu Zhou
Zhaoyu Chen
Jinglun Li
Kaixun Jiang
Wei Zhang
Wenqiang Zhang
VLM
VOS
298
5
0
28 Sep 2024
AVESFormer: Efficient Transformer Design for Real-Time Audio-Visual Segmentation
Zili Wang
Qi Yang
Linsu Shi
Jiazhong Yu
M. Tanveer
Fei Li
Shiming Xiang
VOS
286
5
0
03 Aug 2024
Learning Natural Consistency Representation for Face Forgery Video Detection
Daichi Zhang
Zihao Xiao
Shikun Li
Fanzhao Lin
Jianmin Li
Shiming Ge
CVBM
382
35
0
15 Jul 2024
Learning Spatial-Semantic Features for Robust Video Object Segmentation
Xin Li
Deshui Miao
Zhenyu He
Longji Xu
Huchuan Lu
Ming-Hsuan Yang
VOS
373
5
0
10 Jul 2024
RMem: Restricted Memory Banks Improve Video Object Segmentation
Junbao Zhou
Ziqi Pang
Yu-Xiong Wang
VOS
481
26
0
12 Jun 2024
Progressive Confident Masking Attention Network for Audio-Visual Segmentation
Yuxuan Wang
Feng Dong
Jinchao Zhu
Shuyue Zhu
VOS
433
1
0
04 Jun 2024
Spatial-Temporal Multi-level Association for Video Object Segmentation
European Conference on Computer Vision (ECCV), 2024
Deshui Miao
Xin Li
Zhenyu He
Huchuan Lu
Ming-Hsuan Yang
VOS
207
6
0
09 Apr 2024
Efficient Video Object Segmentation via Modulated Cross-Attention Memory
Abdelrahman M. Shaker
Syed Talal Wasim
Martin Danelljan
Salman Khan
Ming-Hsuan Yang
Fahad Shahbaz Khan
VOS
230
6
0
26 Mar 2024
Video Object Segmentation with Dynamic Query Modulation
IEEE International Conference on Multimedia and Expo (ICME), 2024
Hantao Zhou
Runze Hu
Xiu Li
VOS
210
4
0
18 Mar 2024
OneVOS: Unifying Video Object Segmentation with All-in-One Transformer Framework
European Conference on Computer Vision (ECCV), 2024
Wanyun Li
Pinxue Guo
Xinyu Zhou
Lingyi Hong
Yangji He
Xiangyu Zheng
Wei Zhang
Wenqiang Zhang
VOS
383
13
0
13 Mar 2024
Bootstrapping Audio-Visual Segmentation by Strengthening Audio Cues
Tianxiang Chen
Zhentao Tan
Tao Gong
Qi Chu
Yue-bo Wu
Bin Liu
Le Lu
Jieping Ye
Nenghai Yu
VOS
348
10
0
04 Feb 2024
Self-supervised Video Object Segmentation with Distillation Learning of Deformable Attention
Quang-Trung Truong
Duc Thanh Nguyen
Binh-Son Hua
Sai-Kit Yeung
VOS
393
3
0
25 Jan 2024
TAM-VT: Transformation-Aware Multi-scale Video Transformer for Segmentation and Tracking
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Raghav Goyal
Wan-Cyuan Fan
Mennatullah Siam
Leonid Sigal
VOS
318
4
0
13 Dec 2023
Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation
Qi Yang
Xing Nie
Tong Li
Pengfei Gao
Ying Guo
Cheng Zhen
Pengfei Yan
Shiming Xiang
VOS
262
29
0
11 Dec 2023
Putting the Object Back into Video Object Segmentation
Ho Kei Cheng
Seoung Wug Oh
Brian Price
Joon-Young Lee
Alexander Schwing
VOS
500
212
0
19 Oct 2023
Multimodal Variational Auto-encoder based Audio-Visual Segmentation
IEEE International Conference on Computer Vision (ICCV), 2023
Yuxin Mao
Jing Zhang
Mochu Xiang
Yiran Zhong
Yuchao Dai
213
57
0
12 Oct 2023
Cross-modal Cognitive Consensus guided Audio-Visual Segmentation
IEEE transactions on multimedia (IEEE TMM), 2023
Zhaofeng Shi
Qingbo Wu
Fanman Meng
Linfeng Xu
Hongliang Li
VOS
521
14
0
10 Oct 2023
Segmenting the motion components of a video: A long-term unsupervised model
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
E. Meunier
P. Bouthemy
550
1
0
02 Oct 2023
QDFormer: Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition
Computer Vision and Pattern Recognition (CVPR), 2023
Xiang Li
Jinglu Wang
Xiaohao Xu
Xiulian Peng
Rita Singh
Yan Lu
Bhiksha Raj
VOS
416
19
0
29 Sep 2023
CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation
ACM Multimedia (ACM MM), 2023
Kexin Li
Zongxin Yang
Lei Chen
Yezhou Yang
Jun Xiao
VOS
357
88
0
18 Sep 2023
Discovering Sounding Objects by Audio Queries for Audio Visual Segmentation
International Joint Conference on Artificial Intelligence (IJCAI), 2023
Shaofei Huang
Han Li
Yuqing Wang
Hongji Zhu
Jiao Dai
Jizhong Han
Wenge Rong
Si Liu
VOS
217
33
0
18 Sep 2023
Integrating Boxes and Masks: A Multi-Object Framework for Unified Visual Tracking and Segmentation
IEEE International Conference on Computer Vision (ICCV), 2023
Yuanyou Xu
Zongxin Yang
Yi Yang
VOS
419
18
0
25 Aug 2023
Joint Modeling of Feature, Correspondence, and a Compressed Memory for Video Object Segmentation
Kailai Li
Yutao Cui
Gangshan Wu
Limin Wang
VOS
326
10
0
25 Aug 2023
Scalable Video Object Segmentation with Simplified Framework
IEEE International Conference on Computer Vision (ICCV), 2023
Qiangqiang Wu
Tianyu Yang
WU Wei
Antoni B. Chan
VOS
252
48
0
19 Aug 2023
Improving Audio-Visual Segmentation with Bidirectional Generation
AAAI Conference on Artificial Intelligence (AAAI), 2023
Dawei Hao
Yuxin Mao
Bowen He
Xiaodong Han
Yuchao Dai
Yiran Zhong
VOS
VGen
253
53
0
16 Aug 2023
Isomer: Isomerous Transformer for Zero-shot Video Object Segmentation
IEEE International Conference on Computer Vision (ICCV), 2023
Yichen Yuan
Yifan Wang
Lijun Wang
Xiaoqi Zhao
Huchuan Lu
Yu Wang
Wei Su
Lei Zhang
VOS
254
16
0
13 Aug 2023
Learning Referring Video Object Segmentation from Weak Annotation
Wangbo Zhao
Ke Nan
Songyang Zhang
Kai-xiang Chen
Dahua Lin
Yang You
VOS
291
7
0
04 Aug 2023
Contrastive Conditional Latent Diffusion for Audio-visual Segmentation
IEEE Transactions on Image Processing (IEEE TIP), 2023
Yuxin Mao
Jing Zhang
Mochu Xiang
Yun-Qiu Lv
Dong Li
Yiran Zhong
Yuchao Dai
DiffM
474
44
0
31 Jul 2023
Audio-aware Query-enhanced Transformer for Audio-Visual Segmentation
Jinxian Liu
Chen Ju
Chaofan Ma
Yanfeng Wang
Yu Wang
Ya Zhang
VOS
440
39
0
25 Jul 2023
Hierarchical Spatiotemporal Transformers for Video Object Segmentation
Jun-Sang Yoo
H. Lee
Seung‐Won Jung
VOS
205
2
0
17 Jul 2023
MSViT: Dynamic Mixed-Scale Tokenization for Vision Transformers
Jakob Drachmann Havtorn
Amelie Royer
Tijmen Blankevoort
B. Bejnordi
327
22
0
05 Jul 2023
AVSegFormer: Audio-Visual Segmentation with Transformer
AAAI Conference on Artificial Intelligence (AAAI), 2023
Sheng Gao
Zhe Chen
Guo Chen
Wenhai Wang
Tong Lu
VOS
535
94
0
03 Jul 2023
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
International Conference on Machine Learning (ICML), 2023
Chaitanya K. Ryali
Yuan-Ting Hu
Daniel Bolya
Chen Wei
Haoqi Fan
...
Omid Poursaeed
Judy Hoffman
Jitendra Malik
Yanghao Li
Christoph Feichtenhofer
3DH
463
373
0
01 Jun 2023
SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation
Neural Information Processing Systems (NeurIPS), 2023
Zhuoyan Luo
Yicheng Xiao
Yong-Jin Liu
Shuyan Li
Yitong Wang
Yansong Tang
Xiu Li
Yujiu Yang
VOS
237
73
0
26 May 2023
Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation
AAAI Conference on Artificial Intelligence (AAAI), 2023
Shilin Yan
Renrui Zhang
Ziyu Guo
Wenchao Chen
Wei Zhang
Guoying Gu
Yu Qiao
Hao Dong
Zhongjiang He
Shiyang Feng
VOS
383
66
0
25 May 2023
Annotation-free Audio-Visual Segmentation
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Jinxian Liu
Yu Wang
Chen Ju
Chaofan Ma
Ya Zhang
Weidi Xie
VOS
VLM
456
54
0
18 May 2023
Transavs: End-To-End Audio-Visual Segmentation With Transformer
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Yuhang Ling
Yuxi Li
Zhenye Gan
Jiangning Zhang
M. Chi
Yabiao Wang
VOS
ViT
211
7
0
12 May 2023
Boosting Video Object Segmentation via Space-time Correspondence Learning
Computer Vision and Pattern Recognition (CVPR), 2023
Yurong Zhang
Liulei Li
Wenguan Wang
Rong Xie
Li Song
Wenjun Zhang
VOS
276
47
0
13 Apr 2023
Co-attention Propagation Network for Zero-Shot Video Object Segmentation
IEEE Transactions on Image Processing (IEEE TIP), 2023
Gensheng Pei
Yazhou Yao
Fumin Shen
Daniel Huang
Xing-Rui Huang
Hengtao Shen
VOS
303
16
0
08 Apr 2023
Online Lane Graph Extraction from Onboard Video
Y. Can
Alexander Liniger
D. Paudel
Luc Van Gool
243
4
0
03 Apr 2023
Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers
Computer Vision and Pattern Recognition (CVPR), 2023
Cong Wei
Brendan Duke
R. Jiang
P. Aarabi
Graham W. Taylor
Florian Shkurti
ViT
223
27
0
24 Mar 2023
1
2
Next
Page 1 of 2