ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.13230
  4. Cited By
Video Swin Transformer

Video Swin Transformer

24 June 2021
Ze Liu
Jia Ning
Yue Cao
Yixuan Wei
Zheng-Wei Zhang
Stephen Lin
Han Hu
    ViT
ArXivPDFHTML

Papers citing "Video Swin Transformer"

50 / 152 papers shown
Title
SkeleTR: Towrads Skeleton-based Action Recognition in the Wild
SkeleTR: Towrads Skeleton-based Action Recognition in the Wild
Haodong Duan
Mingze Xu
Bing Shuai
Davide Modolo
Zhuowen Tu
Joseph Tighe
Alessandro Bergamo
ViT
23
1
0
20 Sep 2023
MAGIC-TBR: Multiview Attention Fusion for Transformer-based Bodily
  Behavior Recognition in Group Settings
MAGIC-TBR: Multiview Attention Fusion for Transformer-based Bodily Behavior Recognition in Group Settings
Surbhi Madan
Rishabh Jain
Gulshan Sharma
Ramanathan Subramanian
Abhinav Dhall
25
2
0
19 Sep 2023
Interpretability-Aware Vision Transformer
Interpretability-Aware Vision Transformer
Yao Qiang
Chengyin Li
Prashant Khanduri
D. Zhu
ViT
74
7
0
14 Sep 2023
Temporal Collection and Distribution for Referring Video Object
  Segmentation
Temporal Collection and Distribution for Referring Video Object Segmentation
Jiajin Tang
Ge Zheng
Sibei Yang
VOS
18
14
0
07 Sep 2023
EgoPCA: A New Framework for Egocentric Hand-Object Interaction
  Understanding
EgoPCA: A New Framework for Egocentric Hand-Object Interaction Understanding
Yue Xu
Yong-Lu Li
Zhemin Huang
Michael Xu Liu
Cewu Lu
Yu-Wing Tai
Chi-Keung Tang
EgoV
15
9
0
05 Sep 2023
Learning Sequential Information in Task-based fMRI for Synthetic Data
  Augmentation
Learning Sequential Information in Task-based fMRI for Synthetic Data Augmentation
Jiyao Wang
Nicha Dvornek
Lawrence H. Staib
James S. Duncan
MedIm
18
2
0
29 Aug 2023
NPF-200: A Multi-Modal Eye Fixation Dataset and Method for
  Non-Photorealistic Videos
NPF-200: A Multi-Modal Eye Fixation Dataset and Method for Non-Photorealistic Videos
Ziyuan Yang
Sucheng Ren
Zongwei Wu
Nanxuan Zhao
Junle Wang
Jing Qin
Shengfeng He
14
2
0
23 Aug 2023
Towards Privacy-Supporting Fall Detection via Deep Unsupervised
  RGB2Depth Adaptation
Towards Privacy-Supporting Fall Detection via Deep Unsupervised RGB2Depth Adaptation
Hejun Xiao
Kunyu Peng
Xiangsheng Huang
Alina Roitberg
Hao Li
Zhao Wang
Rainer Stiefelhagen
11
3
0
23 Aug 2023
MGMAE: Motion Guided Masking for Video Masked Autoencoding
MGMAE: Motion Guided Masking for Video Masked Autoencoding
Bingkun Huang
Zhiyu Zhao
Guozhen Zhang
Yu Qiao
Limin Wang
22
29
0
21 Aug 2023
SSTFormer: Bridging Spiking Neural Network and Memory Support Transformer for Frame-Event based Recognition
SSTFormer: Bridging Spiking Neural Network and Memory Support Transformer for Frame-Event based Recognition
Xiao Wang
Zong-Yao Wu
Yao Rong
Lin Zhu
Bowei Jiang
Jin Tang
Yonghong Tian
ViT
64
14
0
08 Aug 2023
Data Augmentation for Human Behavior Analysis in Multi-Person
  Conversations
Data Augmentation for Human Behavior Analysis in Multi-Person Conversations
Kun Li
Dan Guo
Guoliang Chen
Feiyang Liu
Meng Wang
ViT
20
8
0
03 Aug 2023
A survey on deep learning in medical image registration: new
  technologies, uncertainty, evaluation metrics, and beyond
A survey on deep learning in medical image registration: new technologies, uncertainty, evaluation metrics, and beyond
Junyu Chen
Yihao Liu
Shuwen Wei
Zhangxing Bian
Shalini Subramanian
A. Carass
Jerry L. Prince
Yong Du
OOD
28
36
0
28 Jul 2023
Weakly Supervised AI for Efficient Analysis of 3D Pathology Samples
Weakly Supervised AI for Efficient Analysis of 3D Pathology Samples
Andrew H. Song
Mane Williams
Drew F. K. Williamson
Guillaume Jaume
Andrew Zhang
...
R. Serafin
Jonathan T. C. Liu
Alexander S. Baras
Anil V. Parwani
Faisal Mahmood
8
4
0
27 Jul 2023
Sample Less, Learn More: Efficient Action Recognition via Frame Feature
  Restoration
Sample Less, Learn More: Efficient Action Recognition via Frame Feature Restoration
Harry Cheng
Yangyang Guo
Liqiang Nie
Zhiyong Cheng
Mohan S. Kankanhalli
20
7
0
27 Jul 2023
NTIRE 2023 Quality Assessment of Video Enhancement Challenge
NTIRE 2023 Quality Assessment of Video Enhancement Challenge
Xiaohong Liu
Xiongkuo Min
Wei Sun
Yulun Zhang
K. Zhang
...
Te Shi
Azadeh Mansouri
Hossein Motamednia
Amirhossein Bakhtiari
Ahmad Mahmoudi-Aznaveh
23
18
0
19 Jul 2023
What Can Simple Arithmetic Operations Do for Temporal Modeling?
What Can Simple Arithmetic Operations Do for Temporal Modeling?
Wenhao Wu
Yuxin Song
Zhun Sun
Jingdong Wang
Chang Xu
Wanli Ouyang
33
8
0
18 Jul 2023
Hierarchical Spatiotemporal Transformers for Video Object Segmentation
Hierarchical Spatiotemporal Transformers for Video Object Segmentation
Jun-Sang Yoo
H. Lee
Seung‐Won Jung
VOS
13
1
0
17 Jul 2023
Multiscale Memory Comparator Transformer for Few-Shot Video Segmentation
Multiscale Memory Comparator Transformer for Few-Shot Video Segmentation
Mennatullah Siam
R. Karim
Henghui Zhao
Richard P. Wildes
VOS
14
2
0
15 Jul 2023
How can objects help action recognition?
How can objects help action recognition?
Xingyi Zhou
Anurag Arnab
Chen Sun
Cordelia Schmid
30
14
0
20 Jun 2023
Vision Transformers for Mobile Applications: A Short Survey
Vision Transformers for Mobile Applications: A Short Survey
Nahid Alam
Steven Kolawole
S. Sethi
Nishant Bansali
Karina Nguyen
ViT
16
3
0
30 May 2023
COMCAT: Towards Efficient Compression and Customization of
  Attention-Based Vision Models
COMCAT: Towards Efficient Compression and Customization of Attention-Based Vision Models
Jinqi Xiao
Miao Yin
Yu Gong
Xiao Zang
Jian Ren
Bo Yuan
VLM
ViT
25
9
0
26 May 2023
Referred by Multi-Modality: A Unified Temporal Transformer for Video
  Object Segmentation
Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation
Shilin Yan
Renrui Zhang
Ziyu Guo
Wenchao Chen
Wei Zhang
Hongyang Li
Yu Qiao
Hao Dong
Zhongjiang He
Peng Gao
VOS
11
29
0
25 May 2023
Efficient Video Action Detection with Token Dropout and Context
  Refinement
Efficient Video Action Detection with Token Dropout and Context Refinement
Lei Chen
Zhan Tong
Yibing Song
Gangshan Wu
Limin Wang
23
14
0
17 Apr 2023
Bodily expressed emotion understanding through integrating Laban
  movement analysis
Bodily expressed emotion understanding through integrating Laban movement analysis
Chenyan Wu
Dolzodmaa Davaasuren
T. Shafir
Rachelle Tsachor
James Z. Wang
25
6
0
05 Apr 2023
DOAD: Decoupled One Stage Action Detection Network
DOAD: Decoupled One Stage Action Detection Network
Shuning Chang
Pichao Wang
Fan Wang
Jiashi Feng
Mike Zheng Show
10
4
0
01 Apr 2023
Hierarchical Vision Transformers for Cardiac Ejection Fraction
  Estimation
Hierarchical Vision Transformers for Cardiac Ejection Fraction Estimation
Lhuqita Fazry
Asep Haryono
Nuzulul Khairu Nissa
Sunarno
Naufal Muhammad Hirzi
M. F. Rachmadi
W. Jatmiko
MedIm
9
16
0
31 Mar 2023
SnakeVoxFormer: Transformer-based Single Image\\Voxel Reconstruction
  with Run Length Encoding
SnakeVoxFormer: Transformer-based Single Image\\Voxel Reconstruction with Run Length Encoding
Jae Joong Lee
Bedrich Benes
ViT
16
0
0
28 Mar 2023
Transformer-based Multi-Instance Learning for Weakly Supervised Object
  Detection
Transformer-based Multi-Instance Learning for Weakly Supervised Object Detection
Zhaofei Wang
Weijia Zhang
Min-Ling Zhang
ViT
WSOD
10
3
0
27 Mar 2023
MD-VQA: Multi-Dimensional Quality Assessment for UGC Live Videos
MD-VQA: Multi-Dimensional Quality Assessment for UGC Live Videos
Zicheng Zhang
Wei Wu
Wei Sun
Dangyang Tu
Wei Lu
Xiongkuo Min
Ying Chen
Guangtao Zhai
44
39
0
27 Mar 2023
Multi-view knowledge distillation transformer for human action
  recognition
Multi-view knowledge distillation transformer for human action recognition
Yi Lin
Vincent S. Tseng
ViT
13
1
0
25 Mar 2023
Learning Spatial-Temporal Implicit Neural Representations for
  Event-Guided Video Super-Resolution
Learning Spatial-Temporal Implicit Neural Representations for Event-Guided Video Super-Resolution
Yunfan Lu
Zipeng Wang
Minjie Liu
Hongjian Wang
Lin Wang
SupR
16
30
0
24 Mar 2023
Confidence Attention and Generalization Enhanced Distillation for
  Continuous Video Domain Adaptation
Confidence Attention and Generalization Enhanced Distillation for Continuous Video Domain Adaptation
Xiyu Wang
Yuecong Xu
Jianfei Yang
Xiaoli Li
Zhenghua Chen
TTA
19
0
0
18 Mar 2023
Enhanced detection of the presence and severity of COVID-19 from CT
  scans using lung segmentation
Enhanced detection of the presence and severity of COVID-19 from CT scans using lung segmentation
R. Turnbull
19
2
0
16 Mar 2023
PoseRAC: Pose Saliency Transformer for Repetitive Action Counting
PoseRAC: Pose Saliency Transformer for Repetitive Action Counting
Ziyu Yao
Xuxin Cheng
Yuexian Zou
ViT
16
19
0
15 Mar 2023
STOA-VLP: Spatial-Temporal Modeling of Object and Action for
  Video-Language Pre-training
STOA-VLP: Spatial-Temporal Modeling of Object and Action for Video-Language Pre-training
Weihong Zhong
Mao Zheng
Duyu Tang
Xuan Luo
Heng Gong
Xiaocheng Feng
Bing Qin
22
8
0
20 Feb 2023
CholecTriplet2022: Show me a tool and tell me the triplet -- an
  endoscopic vision challenge for surgical action triplet detection
CholecTriplet2022: Show me a tool and tell me the triplet -- an endoscopic vision challenge for surgical action triplet detection
C. Nwoye
Tong Yu
Saurav Sharma
Aditya Murali
Deepak Alapatt
...
Pietro Mascagni
B. Seeliger
Cristians Gonzalez
Didier Mutter
N. Padoy
27
17
0
13 Feb 2023
AIM: Adapting Image Models for Efficient Video Action Recognition
AIM: Adapting Image Models for Efficient Video Action Recognition
Taojiannan Yang
Yi Zhu
Yusheng Xie
Aston Zhang
C. L. P. Chen
Mu Li
ViT
30
143
0
06 Feb 2023
ADAPT: Action-aware Driving Caption Transformer
ADAPT: Action-aware Driving Caption Transformer
Bu Jin
Xinyi Liu
Yupeng Zheng
Pengfei Li
Hao Zhao
Tong Zhang
Yuhang Zheng
Guyue Zhou
Jingjing Liu
13
69
0
01 Feb 2023
CMAE-V: Contrastive Masked Autoencoders for Video Action Recognition
CMAE-V: Contrastive Masked Autoencoders for Video Action Recognition
Cheng Lu
Xiaojie Jin
Zhicheng Huang
Qibin Hou
Mingg-Ming Cheng
Jiashi Feng
24
8
0
15 Jan 2023
Look, Listen, and Attack: Backdoor Attacks Against Video Action
  Recognition
Look, Listen, and Attack: Backdoor Attacks Against Video Action Recognition
Hasan Hammoud
Shuming Liu
Mohammad Alkhrashi
Fahad Albalawi
Bernard Ghanem
AAML
16
8
0
03 Jan 2023
1st Place Solution for YouTubeVOS Challenge 2022: Referring Video Object
  Segmentation
1st Place Solution for YouTubeVOS Challenge 2022: Referring Video Object Segmentation
Zhiwei Hu
Bo Chen
Yuan Gao
Zhilong Ji
Jinfeng Bai
VOS
26
5
0
27 Dec 2022
Location-aware Adaptive Normalization: A Deep Learning Approach For
  Wildfire Danger Forecasting
Location-aware Adaptive Normalization: A Deep Learning Approach For Wildfire Danger Forecasting
Mohamad Hakam Shams Eddin
R. Roscher
Juergen Gall
19
11
0
16 Dec 2022
Egocentric Video Task Translation
Egocentric Video Task Translation
Zihui Xue
Yale Song
Kristen Grauman
Lorenzo Torresani
EgoV
16
13
0
13 Dec 2022
Fine-tuned CLIP Models are Efficient Video Learners
Fine-tuned CLIP Models are Efficient Video Learners
H. Rasheed
Muhammad Uzair Khattak
Muhammad Maaz
Salman Khan
F. Khan
CLIP
VLM
14
148
0
06 Dec 2022
Video Object of Interest Segmentation
Video Object of Interest Segmentation
Siyuan Zhou
Chunru Zhan
Biao Wang
T. Ge
Yuning Jiang
Li Niu
VOS
18
0
0
06 Dec 2022
BEV-SAN: Accurate BEV 3D Object Detection via Slice Attention Networks
BEV-SAN: Accurate BEV 3D Object Detection via Slice Attention Networks
Xiaowei Chi
Jiaming Liu
Ming Lu
Rongyu Zhang
Zhaoqing Wang
Yandong Guo
Shanghang Zhang
3DPC
30
18
0
02 Dec 2022
MGFN: Magnitude-Contrastive Glance-and-Focus Network for
  Weakly-Supervised Video Anomaly Detection
MGFN: Magnitude-Contrastive Glance-and-Focus Network for Weakly-Supervised Video Anomaly Detection
Y. Chen
Zhengzhe Liu
Baoheng Zhang
W. Fok
Xiaojuan Qi
Yik-Chung Wu
10
108
0
28 Nov 2022
Re^2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal
  Action Localization
Re^2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization
Chen Zhao
Shuming Liu
K. Mangalam
Bernard Ghanem
10
17
0
25 Nov 2022
Token Turing Machines
Token Turing Machines
Michael S. Ryoo
K. Gopalakrishnan
Kumara Kahatapitiya
Ted Xiao
Kanishka Rao
Austin Stone
Yao Lu
Julian Ibarz
Anurag Arnab
27
21
0
16 Nov 2022
EVA: Exploring the Limits of Masked Visual Representation Learning at
  Scale
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
Yuxin Fang
Wen Wang
Binhui Xie
Quan-Sen Sun
Ledell Yu Wu
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
CLIP
31
671
0
14 Nov 2022
Previous
1234
Next