ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2101.08833
  4. Cited By
SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation
v1v2 (latest)

SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation

Computer Vision and Pattern Recognition (CVPR), 2021
21 January 2021
Brendan Duke
Abdalla Ahmed
Christian Wolf
P. Aarabi
Graham W. Taylor
    VOS
ArXiv (abs)PDFHTMLGithub (87★)

Papers citing "SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation"

50 / 100 papers shown
Segment Anything Across Shots: A Method and Benchmark
Segment Anything Across Shots: A Method and Benchmark
Hengrui Hu
Kaining Ying
Henghui Ding
VOS
404
0
0
17 Nov 2025
MOSEv2: A More Challenging Dataset for Video Object Segmentation in Complex Scenes
MOSEv2: A More Challenging Dataset for Video Object Segmentation in Complex Scenes
Henghui Ding
Kaining Ying
Chang-rui Liu
Shuting He
Xudong Jiang
Yu-Gang Jiang
Juil Sock
Song Bai
VOS
416
40
0
07 Aug 2025
Advancing Complex Video Object Segmentation via Progressive Concept Construction
Advancing Complex Video Object Segmentation via Progressive Concept Construction
Zhixiong Zhang
Shuangrui Ding
Xiaoyi Dong
Songxin He
Jianfan Lin
Junsong Tang
Yuhang Zang
Yuhang Cao
Dahua Lin
Jiaqi Wang
VOS
305
14
0
21 Jul 2025
OpenAVS: Training-Free Open-Vocabulary Audio Visual Segmentation with Foundational Models
OpenAVS: Training-Free Open-Vocabulary Audio Visual Segmentation with Foundational Models
Shengkai Chen
Yifang Yin
Jinming Cao
Shili Xiang
Zhenguang Liu
Roger Zimmermann
VOSVLM
329
1
0
30 Apr 2025
MoSAM: Motion-Guided Segment Anything Model with Spatial-Temporal Memory Selection
MoSAM: Motion-Guided Segment Anything Model with Spatial-Temporal Memory Selection
Q. Yang
Xingtai Lv
Miaomiao Cui
Liefeng Bo
VLM
373
4
0
30 Apr 2025
EdgeTAM: On-Device Track Anything Model
EdgeTAM: On-Device Track Anything ModelComputer Vision and Pattern Recognition (CVPR), 2025
Chong Zhou
Chenchen Zhu
Yunyang Xiong
Saksham Suri
Fanyi Xiao
...
Raghuraman Krishnamoorthi
Bo Dai
Chen Change Loy
Vikas Chandra
Bilge Soran
VLM
369
19
0
13 Jan 2025
Efficient Track Anything
Efficient Track Anything
Yunyang Xiong
Chong Zhou
Xiaoyu Xiang
Lemeng Wu
Chenchen Zhu
...
Ramya Akula
Forrest N. Iandola
Raghuraman Krishnamoorthi
Bilge Soran
Vikas Chandra
VLMVOS
307
18
0
28 Nov 2024
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree
Shuangrui Ding
Rui Qian
Xiaoyi Dong
Pan Zhang
Yuhang Zang
Yuhang Cao
Yuwei Guo
Dahua Lin
Jiaqi Wang
VLMVOS
434
68
0
21 Oct 2024
ImmersePro: End-to-End Stereo Video Synthesis Via Implicit Disparity
  Learning
ImmersePro: End-to-End Stereo Video Synthesis Via Implicit Disparity Learning
Jian Shi
Zhenyu Li
Peter Wonka
MDE
303
8
0
30 Sep 2024
X-Prompt: Multi-modal Visual Prompt for Video Object Segmentation
X-Prompt: Multi-modal Visual Prompt for Video Object SegmentationACM Multimedia (MM), 2024
Pinxue Guo
Wanyun Li
Hao Huang
Lingyi Hong
Xinyu Zhou
Zhaoyu Chen
Jinglun Li
Kaixun Jiang
Wei Zhang
Wenqiang Zhang
VLMVOS
298
5
0
28 Sep 2024
AVESFormer: Efficient Transformer Design for Real-Time Audio-Visual
  Segmentation
AVESFormer: Efficient Transformer Design for Real-Time Audio-Visual Segmentation
Zili Wang
Qi Yang
Linsu Shi
Jiazhong Yu
M. Tanveer
Fei Li
Shiming Xiang
VOS
286
5
0
03 Aug 2024
Learning Natural Consistency Representation for Face Forgery Video
  Detection
Learning Natural Consistency Representation for Face Forgery Video Detection
Daichi Zhang
Zihao Xiao
Shikun Li
Fanzhao Lin
Jianmin Li
Shiming Ge
CVBM
382
35
0
15 Jul 2024
Learning Spatial-Semantic Features for Robust Video Object Segmentation
Learning Spatial-Semantic Features for Robust Video Object Segmentation
Xin Li
Deshui Miao
Zhenyu He
Longji Xu
Huchuan Lu
Ming-Hsuan Yang
VOS
373
5
0
10 Jul 2024
RMem: Restricted Memory Banks Improve Video Object Segmentation
RMem: Restricted Memory Banks Improve Video Object Segmentation
Junbao Zhou
Ziqi Pang
Yu-Xiong Wang
VOS
481
26
0
12 Jun 2024
Progressive Confident Masking Attention Network for Audio-Visual Segmentation
Progressive Confident Masking Attention Network for Audio-Visual Segmentation
Yuxuan Wang
Feng Dong
Jinchao Zhu
Shuyue Zhu
VOS
433
1
0
04 Jun 2024
Spatial-Temporal Multi-level Association for Video Object Segmentation
Spatial-Temporal Multi-level Association for Video Object SegmentationEuropean Conference on Computer Vision (ECCV), 2024
Deshui Miao
Xin Li
Zhenyu He
Huchuan Lu
Ming-Hsuan Yang
VOS
207
6
0
09 Apr 2024
Efficient Video Object Segmentation via Modulated Cross-Attention Memory
Efficient Video Object Segmentation via Modulated Cross-Attention Memory
Abdelrahman M. Shaker
Syed Talal Wasim
Martin Danelljan
Salman Khan
Ming-Hsuan Yang
Fahad Shahbaz Khan
VOS
230
6
0
26 Mar 2024
Video Object Segmentation with Dynamic Query Modulation
Video Object Segmentation with Dynamic Query ModulationIEEE International Conference on Multimedia and Expo (ICME), 2024
Hantao Zhou
Runze Hu
Xiu Li
VOS
210
4
0
18 Mar 2024
OneVOS: Unifying Video Object Segmentation with All-in-One Transformer
  Framework
OneVOS: Unifying Video Object Segmentation with All-in-One Transformer FrameworkEuropean Conference on Computer Vision (ECCV), 2024
Wanyun Li
Pinxue Guo
Xinyu Zhou
Lingyi Hong
Yangji He
Xiangyu Zheng
Wei Zhang
Wenqiang Zhang
VOS
383
13
0
13 Mar 2024
Bootstrapping Audio-Visual Segmentation by Strengthening Audio Cues
Bootstrapping Audio-Visual Segmentation by Strengthening Audio Cues
Tianxiang Chen
Zhentao Tan
Tao Gong
Qi Chu
Yue-bo Wu
Bin Liu
Le Lu
Jieping Ye
Nenghai Yu
VOS
348
10
0
04 Feb 2024
Self-supervised Video Object Segmentation with Distillation Learning of
  Deformable Attention
Self-supervised Video Object Segmentation with Distillation Learning of Deformable Attention
Quang-Trung Truong
Duc Thanh Nguyen
Binh-Son Hua
Sai-Kit Yeung
VOS
393
3
0
25 Jan 2024
TAM-VT: Transformation-Aware Multi-scale Video Transformer for
  Segmentation and Tracking
TAM-VT: Transformation-Aware Multi-scale Video Transformer for Segmentation and TrackingIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Raghav Goyal
Wan-Cyuan Fan
Mennatullah Siam
Leonid Sigal
VOS
318
4
0
13 Dec 2023
Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for
  Audio-Visual Segmentation
Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation
Qi Yang
Xing Nie
Tong Li
Pengfei Gao
Ying Guo
Cheng Zhen
Pengfei Yan
Shiming Xiang
VOS
262
29
0
11 Dec 2023
Putting the Object Back into Video Object Segmentation
Putting the Object Back into Video Object Segmentation
Ho Kei Cheng
Seoung Wug Oh
Brian Price
Joon-Young Lee
Alexander Schwing
VOS
500
212
0
19 Oct 2023
Multimodal Variational Auto-encoder based Audio-Visual Segmentation
Multimodal Variational Auto-encoder based Audio-Visual SegmentationIEEE International Conference on Computer Vision (ICCV), 2023
Yuxin Mao
Jing Zhang
Mochu Xiang
Yiran Zhong
Yuchao Dai
213
57
0
12 Oct 2023
Cross-modal Cognitive Consensus guided Audio-Visual Segmentation
Cross-modal Cognitive Consensus guided Audio-Visual SegmentationIEEE transactions on multimedia (IEEE TMM), 2023
Zhaofeng Shi
Qingbo Wu
Fanman Meng
Linfeng Xu
Hongliang Li
VOS
521
14
0
10 Oct 2023
Segmenting the motion components of a video: A long-term unsupervised
  model
Segmenting the motion components of a video: A long-term unsupervised modelIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
E. Meunier
P. Bouthemy
550
1
0
02 Oct 2023
QDFormer: Towards Robust Audiovisual Segmentation in Complex
  Environments with Quantization-based Semantic Decomposition
QDFormer: Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic DecompositionComputer Vision and Pattern Recognition (CVPR), 2023
Xiang Li
Jinglu Wang
Xiaohao Xu
Xiulian Peng
Rita Singh
Yan Lu
Bhiksha Raj
VOS
416
19
0
29 Sep 2023
CATR: Combinatorial-Dependence Audio-Queried Transformer for
  Audio-Visual Video Segmentation
CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video SegmentationACM Multimedia (ACM MM), 2023
Kexin Li
Zongxin Yang
Lei Chen
Yezhou Yang
Jun Xiao
VOS
357
88
0
18 Sep 2023
Discovering Sounding Objects by Audio Queries for Audio Visual
  Segmentation
Discovering Sounding Objects by Audio Queries for Audio Visual SegmentationInternational Joint Conference on Artificial Intelligence (IJCAI), 2023
Shaofei Huang
Han Li
Yuqing Wang
Hongji Zhu
Jiao Dai
Jizhong Han
Wenge Rong
Si Liu
VOS
217
33
0
18 Sep 2023
Integrating Boxes and Masks: A Multi-Object Framework for Unified Visual
  Tracking and Segmentation
Integrating Boxes and Masks: A Multi-Object Framework for Unified Visual Tracking and SegmentationIEEE International Conference on Computer Vision (ICCV), 2023
Yuanyou Xu
Zongxin Yang
Yi Yang
VOS
419
18
0
25 Aug 2023
Joint Modeling of Feature, Correspondence, and a Compressed Memory for Video Object Segmentation
Joint Modeling of Feature, Correspondence, and a Compressed Memory for Video Object Segmentation
Kailai Li
Yutao Cui
Gangshan Wu
Limin Wang
VOS
326
10
0
25 Aug 2023
Scalable Video Object Segmentation with Simplified Framework
Scalable Video Object Segmentation with Simplified FrameworkIEEE International Conference on Computer Vision (ICCV), 2023
Qiangqiang Wu
Tianyu Yang
WU Wei
Antoni B. Chan
VOS
252
48
0
19 Aug 2023
Improving Audio-Visual Segmentation with Bidirectional Generation
Improving Audio-Visual Segmentation with Bidirectional GenerationAAAI Conference on Artificial Intelligence (AAAI), 2023
Dawei Hao
Yuxin Mao
Bowen He
Xiaodong Han
Yuchao Dai
Yiran Zhong
VOSVGen
253
53
0
16 Aug 2023
Isomer: Isomerous Transformer for Zero-shot Video Object Segmentation
Isomer: Isomerous Transformer for Zero-shot Video Object SegmentationIEEE International Conference on Computer Vision (ICCV), 2023
Yichen Yuan
Yifan Wang
Lijun Wang
Xiaoqi Zhao
Huchuan Lu
Yu Wang
Wei Su
Lei Zhang
VOS
254
16
0
13 Aug 2023
Learning Referring Video Object Segmentation from Weak Annotation
Learning Referring Video Object Segmentation from Weak Annotation
Wangbo Zhao
Ke Nan
Songyang Zhang
Kai-xiang Chen
Dahua Lin
Yang You
VOS
291
7
0
04 Aug 2023
Contrastive Conditional Latent Diffusion for Audio-visual Segmentation
Contrastive Conditional Latent Diffusion for Audio-visual SegmentationIEEE Transactions on Image Processing (IEEE TIP), 2023
Yuxin Mao
Jing Zhang
Mochu Xiang
Yun-Qiu Lv
Dong Li
Yiran Zhong
Yuchao Dai
DiffM
474
44
0
31 Jul 2023
Audio-aware Query-enhanced Transformer for Audio-Visual Segmentation
Audio-aware Query-enhanced Transformer for Audio-Visual Segmentation
Jinxian Liu
Chen Ju
Chaofan Ma
Yanfeng Wang
Yu Wang
Ya Zhang
VOS
440
39
0
25 Jul 2023
Hierarchical Spatiotemporal Transformers for Video Object Segmentation
Hierarchical Spatiotemporal Transformers for Video Object Segmentation
Jun-Sang Yoo
H. Lee
Seung‐Won Jung
VOS
205
2
0
17 Jul 2023
MSViT: Dynamic Mixed-Scale Tokenization for Vision Transformers
MSViT: Dynamic Mixed-Scale Tokenization for Vision Transformers
Jakob Drachmann Havtorn
Amelie Royer
Tijmen Blankevoort
B. Bejnordi
327
22
0
05 Jul 2023
AVSegFormer: Audio-Visual Segmentation with Transformer
AVSegFormer: Audio-Visual Segmentation with TransformerAAAI Conference on Artificial Intelligence (AAAI), 2023
Sheng Gao
Zhe Chen
Guo Chen
Wenhai Wang
Tong Lu
VOS
535
94
0
03 Jul 2023
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Hiera: A Hierarchical Vision Transformer without the Bells-and-WhistlesInternational Conference on Machine Learning (ICML), 2023
Chaitanya K. Ryali
Yuan-Ting Hu
Daniel Bolya
Chen Wei
Haoqi Fan
...
Omid Poursaeed
Judy Hoffman
Jitendra Malik
Yanghao Li
Christoph Feichtenhofer
3DH
463
373
0
01 Jun 2023
SOC: Semantic-Assisted Object Cluster for Referring Video Object
  Segmentation
SOC: Semantic-Assisted Object Cluster for Referring Video Object SegmentationNeural Information Processing Systems (NeurIPS), 2023
Zhuoyan Luo
Yicheng Xiao
Yong-Jin Liu
Shuyan Li
Yitong Wang
Yansong Tang
Xiu Li
Yujiu Yang
VOS
237
73
0
26 May 2023
Referred by Multi-Modality: A Unified Temporal Transformer for Video
  Object Segmentation
Referred by Multi-Modality: A Unified Temporal Transformer for Video Object SegmentationAAAI Conference on Artificial Intelligence (AAAI), 2023
Shilin Yan
Renrui Zhang
Ziyu Guo
Wenchao Chen
Wei Zhang
Guoying Gu
Yu Qiao
Hao Dong
Zhongjiang He
Shiyang Feng
VOS
383
66
0
25 May 2023
Annotation-free Audio-Visual Segmentation
Annotation-free Audio-Visual SegmentationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Jinxian Liu
Yu Wang
Chen Ju
Chaofan Ma
Ya Zhang
Weidi Xie
VOSVLM
456
54
0
18 May 2023
Transavs: End-To-End Audio-Visual Segmentation With Transformer
Transavs: End-To-End Audio-Visual Segmentation With TransformerIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Yuhang Ling
Yuxi Li
Zhenye Gan
Jiangning Zhang
M. Chi
Yabiao Wang
VOSViT
211
7
0
12 May 2023
Boosting Video Object Segmentation via Space-time Correspondence
  Learning
Boosting Video Object Segmentation via Space-time Correspondence LearningComputer Vision and Pattern Recognition (CVPR), 2023
Yurong Zhang
Liulei Li
Wenguan Wang
Rong Xie
Li Song
Wenjun Zhang
VOS
276
47
0
13 Apr 2023
Co-attention Propagation Network for Zero-Shot Video Object Segmentation
Co-attention Propagation Network for Zero-Shot Video Object SegmentationIEEE Transactions on Image Processing (IEEE TIP), 2023
Gensheng Pei
Yazhou Yao
Fumin Shen
Daniel Huang
Xing-Rui Huang
Hengtao Shen
VOS
303
16
0
08 Apr 2023
Online Lane Graph Extraction from Onboard Video
Online Lane Graph Extraction from Onboard Video
Y. Can
Alexander Liniger
D. Paudel
Luc Van Gool
243
4
0
03 Apr 2023
Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient
  Vision Transformers
Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision TransformersComputer Vision and Pattern Recognition (CVPR), 2023
Cong Wei
Brendan Duke
R. Jiang
P. Aarabi
Graham W. Taylor
Florian Shkurti
ViT
223
27
0
24 Mar 2023
12
Next
Page 1 of 2