Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.13230
Cited By
Video Swin Transformer
24 June 2021
Ze Liu
Jia Ning
Yue Cao
Yixuan Wei
Zheng-Wei Zhang
Stephen Lin
Han Hu
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Video Swin Transformer"
50 / 150 papers shown
Title
Weakly Supervised Temporal Sentence Grounding via Positive Sample Mining
Lu Dong
H. Zhang
Hongjie Zhang
Y. Huang
Z. Ling
Yu Qiao
Limin Wang
Y. Wang
AI4TS
16
0
0
10 May 2025
DiGIT: Multi-Dilated Gated Encoder and Central-Adjacent Region Integrated Decoder for Temporal Action Detection Transformer
Ho-Joong Kim
Y. E. Lee
Jung-Ho Hong
Seong-Whan Lee
23
0
0
09 May 2025
T2VTextBench: A Human Evaluation Benchmark for Textual Control in Video Generation Models
Xuyang Guo
Jiayan Huo
Zhenmei Shi
Zhao-quan Song
Jiahao Zhang
Jiale Zhao
VGen
50
0
0
08 May 2025
Breaking Annotation Barriers: Generalized Video Quality Assessment via Ranking-based Self-Supervision
Linhan Cao
Wei Sun
Kaiwei Zhang
Yicong Peng
Guangtao Zhai
Xiongkuo Min
47
0
0
06 May 2025
A Deep Learning approach for Depressive Symptoms assessment in Parkinson's disease patients using facial videos
Ioannis Kyprakis
Vasileios Skaramagkas
Iro Boura
Georgios Karamanis
D. Fotiadis
Zinovia Kefalopoulou
Cleanthe Spanaki
M. Tsiknakis
27
0
0
05 May 2025
T2VPhysBench: A First-Principles Benchmark for Physical Consistency in Text-to-Video Generation
Xuyang Guo
Jiayan Huo
Zhenmei Shi
Zhao-quan Song
Jiahao Zhang
Jiale Zhao
EGVM
VGen
PINN
75
1
0
01 May 2025
DS_FusionNet: Dynamic Dual-Stream Fusion with Bidirectional Knowledge Distillation for Plant Disease Recognition
Yanghui Song
Chengfu Yang
MedIm
29
0
0
29 Apr 2025
Learning Streaming Video Representation via Multitask Training
Yibin Yan
Jilan Xu
Shangzhe Di
Yikun Liu
Yudi Shi
Qirui Chen
Zeqian Li
Yifei Huang
Weidi Xie
CLL
76
0
0
28 Apr 2025
Hierarchical and Multimodal Data for Daily Activity Understanding
Ghazal Kaviani
Yavuz Yarici
Seulgi Kim
M. Prabhushankar
Ghassan AlRegib
Mashhour Solh
Ameya Patil
49
0
0
24 Apr 2025
Fast Adversarial Training with Weak-to-Strong Spatial-Temporal Consistency in the Frequency Domain on Videos
Songping Wang
Hanqing Liu
Yueming Lyu
Xiantao Hu
Ziwen He
W. Wang
Caifeng Shan
L. Wang
AAML
29
0
0
21 Apr 2025
Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark
Enxin Song
Wenhao Chai
Weili Xu
Jianwen Xie
Yuxuan Liu
Gaoang Wang
54
0
0
20 Apr 2025
DTFSal: Audio-Visual Dynamic Token Fusion for Video Saliency Prediction
Kiana Hoshanfar
Alireza Hosseini
Ahmad Kalhor
Babak Nadjar Araabi
44
0
0
14 Apr 2025
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model
Team Seawead
Ceyuan Yang
Zhijie Lin
Yang Zhao
Shanchuan Lin
...
Zuquan Song
Zhenheng Yang
Jiashi Feng
Jianchao Yang
Lu Jiang
DiffM
75
1
0
11 Apr 2025
Is Temporal Prompting All We Need For Limited Labeled Action Recognition?
Shreyank N. Gowda
Boyan Gao
Xiao Gu
Xiaobo Jin
VLM
32
0
0
02 Apr 2025
Mamba-3D as Masked Autoencoders for Accurate and Data-Efficient Analysis of Medical Ultrasound Videos
Jiaheng Zhou
Yanfeng Zhou
Wei Fang
Yuxing Tang
Le Lu
Ge Yang
Mamba
119
0
0
26 Mar 2025
A Comprehensive Survey of Foundation Models in Medicine
Wasif Khan
Seowung Leem
Kyle B. See
Joshua K. Wong
Shaoting Zhang
R. Fang
AI4CE
LM&MA
VLM
95
16
0
17 Jan 2025
Learning Motion and Temporal Cues for Unsupervised Video Object Segmentation
Yunzhi Zhuge
Hongyu Gu
Lu Zhang
Jinqing Qi
Huchuan Lu
VOS
58
2
0
14 Jan 2025
Localization-Aware Multi-Scale Representation Learning for Repetitive Action Counting
Sujia Wang
Xiangwei Shen
Yansong Tang
Xin Luna Dong
Wenjia Geng
Lei Chen
39
0
0
13 Jan 2025
Video Quality Assessment for Online Processing: From Spatial to Temporal Sampling
Jiebin Yan
Lei Wu
Yuming Fang
Xuelin Liu
Xue Xia
Weide Liu
62
2
0
13 Jan 2025
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
Pinelopi Papalampidi
Skanda Koppula
Shreya Pathak
Justin T Chiu
Joseph Heyward
Viorica Patraucean
Jiajun Shen
Antoine Miech
Andrew Zisserman
Aida Nematzdeh
VLM
54
23
0
31 Dec 2024
Video Diffusion Transformers are In-Context Learners
Zhengcong Fei
Di Qiu
Changqian Yu
Debang Li
Mingyuan Fan
VGen
DiffM
107
2
0
14 Dec 2024
Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding
Yiming Zhang
Zhuokai Zhao
Zhaorun Chen
Zenghui Ding
Xianjun Yang
Yining Sun
89
1
0
21 Nov 2024
Principles of Visual Tokens for Efficient Video Understanding
Xinyue Hao
Gen Li
Shreyank N. Gowda
Robert B Fisher
Jonathan Huang
Anurag Arnab
Laura Sevilla-Lara
73
0
0
20 Nov 2024
Secure Video Quality Assessment Resisting Adversarial Attacks
Ao Zhang
Yu Ran
Weixuan Tang
Yuan-Gen Wang
Qingxiao Guan
Chunsheng Yang
AAML
22
0
0
09 Oct 2024
Ctrl-GenAug: Controllable Generative Augmentation for Medical Sequence Classification
Xinrui Zhou
Yuhao Huang
Haoran Dou
Shijing Chen
Ao Chang
...
Jie Jessie Ren
Ruobing Huang
Jun Cheng
Wufeng Xue
Dong Ni
MedIm
45
0
0
25 Sep 2024
GMFL-Net: A Global Multi-geometric Feature Learning Network for Repetitive Action Counting
Jun Li
Jinying Wu
Qiming Li
Feifei Guo
21
0
0
31 Aug 2024
Start from Video-Music Retrieval: An Inter-Intra Modal Loss for Cross Modal Retrieval
Zeyu Chen
Pengfei Zhang
Kai Ye
Wei Dong
Xin Feng
Yana Zhang
28
0
0
28 Jul 2024
A Comprehensive Review of Few-shot Action Recognition
Yuyang Wanyan
Xiaoshan Yang
Weiming Dong
Changsheng Xu
VLM
49
3
0
20 Jul 2024
MMAD: Multi-label Micro-Action Detection in Videos
Kun Li
Pengyu Liu
Pengyu Liu
Guoliang Chen
Zhiliang Wu
Hehe Fan
Meng Wang
29
2
0
07 Jul 2024
Tarsier: Recipes for Training and Evaluating Large Video Description Models
Jiawei Wang
Liping Yuan
Yuchen Zhang
29
52
0
30 Jun 2024
Accessible, At-Home Detection of Parkinson's Disease via Multi-task Video Analysis
Md. Saiful Islam
Tariq Adnan
Jan Freyberg
Sangwu Lee
Abdelrahman Abdelkader
...
Cathe Schwartz
Karen Jaffe
Ruth B. Schneider
E. R. Dorsey
Ehsan Hoque
68
0
0
21 Jun 2024
Video Frame Interpolation for Polarization via Swin-Transformer
Feng Huang
Xin Zhang
Yixuan Xu
Xuesong Wang
Xianyu Wu
19
0
0
17 Jun 2024
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
Zeyue Tian
Zhaoyang Liu
Ruibin Yuan
Jiahao Pan
Xiaoqiang Huang
Xu Tan
Xu Tan
Qifeng Chen
Y. Guo
VGen
97
16
0
06 Jun 2024
Accelerating Transformers with Spectrum-Preserving Token Merging
Hoai-Chau Tran
D. M. Nguyen
Duy M. Nguyen
Trung Thanh Nguyen
Ngan Le
Pengtao Xie
Daniel Sonntag
James Y. Zou
Binh T. Nguyen
Mathias Niepert
32
8
0
25 May 2024
Distinguish Any Fake Videos: Unleashing the Power of Large-scale Data and Motion Features
Lichuan Ji
Yingqi Lin
Zhenhua Huang
Yan Han
Xiaogang Xu
Jiafei Wu
Chong Wang
Zhe Liu
43
3
0
24 May 2024
SSUMamba: Spatial-Spectral Selective State Space Model for Hyperspectral Image Denoising
Guanyiman Fu
Fengchao Xiong
Jianfeng Lu
Jun Zhou
Mamba
24
19
0
02 May 2024
Deep Learning for Melt Pool Depth Contour Prediction From Surface Thermal Images via Vision Transformers
Francis Ogoke
P. Pak
Alexander J. Myers
Guadalupe Quirarte
Jack L. Beuth
Jonathan A. Malen
A. Farimani
AI4CE
ViT
14
2
0
26 Apr 2024
MCSDNet: Mesoscale Convective System Detection Network via Multi-scale Spatiotemporal Information
Jiajun Liang
Baoquan Zhang
Yunming Ye
Xutao Li
Chuyao Luo
Xukai Fu
24
0
0
26 Apr 2024
Arena: A Patch-of-Interest ViT Inference Acceleration System for Edge-Assisted Video Analytics
Haosong Peng
Wei Feng
Hao Li
Yufeng Zhan
Qihua Zhou
Yuanqing Xia
11
2
0
14 Apr 2024
Jointly Modeling Spatio-Temporal Features of Tactile Signals for Action Classification
Jimmy Lin
Junkai Li
Jiasi Gao
Weizhi Ma
Yang Liu
12
0
0
21 Jan 2024
Learning to Visually Connect Actions and their Effects
Eric Peh
Paritosh Parmar
Basura Fernando
22
2
0
19 Jan 2024
Hyperspectral Image Denoising via Spatial-Spectral Recurrent Transformer
Guanyiman Fu
Fengchao Xiong
Jianfeng Lu
Jun Zhou
Jiantao Zhou
Yuntao Qian
ViT
14
11
0
31 Dec 2023
Unleashing the Power of CNN and Transformer for Balanced RGB-Event Video Recognition
Xiao Wang
Yao Rong
Shiao Wang
Yuan Chen
Zhe Wu
Bowei Jiang
Yonghong Tian
Jin Tang
ViT
58
3
0
18 Dec 2023
SVQ: Sparse Vector Quantization for Spatiotemporal Forecasting
Chao Chen
Tian Zhou
Yanjun Zhao
Hui Liu
Liang Sun
Rong Jin
17
0
0
06 Dec 2023
Unsupervised Video Domain Adaptation with Masked Pre-Training and Collaborative Self-Training
Arun V. Reddy
William Paul
Corban Rivera
Ketul Shah
Celso M. de Melo
Rama Chellappa
32
4
0
05 Dec 2023
Object-based (yet Class-agnostic) Video Domain Adaptation
Dantong Niu
Amir Bar
Roei Herzig
Trevor Darrell
Anna Rohrbach
19
1
0
29 Nov 2023
End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames
Shuming Liu
Chen-Da Liu-Zhang
Chen Zhao
Bernard Ghanem
24
25
0
28 Nov 2023
Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook
Ming Jin
Qingsong Wen
Yuxuan Liang
Chaoli Zhang
Siqiao Xue
...
Shirui Pan
Vincent S. Tseng
Yu Zheng
Lei Chen
Hui Xiong
AI4TS
SyDa
31
116
0
16 Oct 2023
Training a Large Video Model on a Single Machine in a Day
Yue Zhao
Philipp Krahenbuhl
VLM
25
15
0
28 Sep 2023
Zero-Shot and Few-Shot Video Question Answering with Multi-Modal Prompts
Bipin Rajendran
Bashir M. Al-Hashimi
MLLM
VLM
26
2
0
27 Sep 2023
1
2
3
Next