ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1812.02707
  4. Cited By
Video Action Transformer Network

Video Action Transformer Network

6 December 2018
Rohit Girdhar
João Carreira
Carl Doersch
Andrew Zisserman
    ViT
ArXivPDFHTML

Papers citing "Video Action Transformer Network"

50 / 122 papers shown
Title
Video Joint Modelling Based on Hierarchical Transformer for
  Co-summarization
Video Joint Modelling Based on Hierarchical Transformer for Co-summarization
Haopeng Li
Qiuhong Ke
Mingming Gong
Zhang Rui
ViT
26
22
0
27 Dec 2021
Recur, Attend or Convolve? On Whether Temporal Modeling Matters for
  Cross-Domain Robustness in Action Recognition
Recur, Attend or Convolve? On Whether Temporal Modeling Matters for Cross-Domain Robustness in Action Recognition
Sofia Broomé
Ernest Pokropek
Boyu Li
Hedvig Kjellström
13
7
0
22 Dec 2021
Distillation of Human-Object Interaction Contexts for Action Recognition
Distillation of Human-Object Interaction Contexts for Action Recognition
Muna Almushyti
Frederick W. Li
26
3
0
17 Dec 2021
Short and Long Range Relation Based Spatio-Temporal Transformer for
  Micro-Expression Recognition
Short and Long Range Relation Based Spatio-Temporal Transformer for Micro-Expression Recognition
Liangfei Zhang
Xiaopeng Hong
Ognjen Arandjelovic
Guoying Zhao
ViT
28
47
0
10 Dec 2021
FLAVA: A Foundational Language And Vision Alignment Model
FLAVA: A Foundational Language And Vision Alignment Model
Amanpreet Singh
Ronghang Hu
Vedanuj Goswami
Guillaume Couairon
Wojciech Galuba
Marcus Rohrbach
Douwe Kiela
CLIP
VLM
38
686
0
08 Dec 2021
Exploring Temporal Granularity in Self-Supervised Video Representation
  Learning
Exploring Temporal Granularity in Self-Supervised Video Representation Learning
Rui Qian
Yeqing Li
Liangzhe Yuan
Boqing Gong
Ting Liu
Matthew A. Brown
Serge J. Belongie
Ming-Hsuan Yang
Hartwig Adam
Yin Cui
AI4TS
41
6
0
08 Dec 2021
MViTv2: Improved Multiscale Vision Transformers for Classification and
  Detection
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
Yanghao Li
Chaoxia Wu
Haoqi Fan
K. Mangalam
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
46
677
0
02 Dec 2021
Vision Pair Learning: An Efficient Training Framework for Image
  Classification
Vision Pair Learning: An Efficient Training Framework for Image Classification
Bei Tong
Xiaoyuan Yu
ViT
17
0
0
02 Dec 2021
Conditional Object-Centric Learning from Video
Conditional Object-Centric Learning from Video
Thomas Kipf
Gamaleldin F. Elsayed
Aravindh Mahendran
Austin Stone
S. Sabour
G. Heigold
Rico Jonschkowski
Alexey Dosovitskiy
Klaus Greff
OCL
39
214
0
24 Nov 2021
PhysFormer: Facial Video-based Physiological Measurement with Temporal
  Difference Transformer
PhysFormer: Facial Video-based Physiological Measurement with Temporal Difference Transformer
Zitong Yu
Yuming Shen
Jingang Shi
Hengshuang Zhao
Philip H. S. Torr
Guoying Zhao
ViT
MedIm
132
167
0
23 Nov 2021
Towards Tokenized Human Dynamics Representation
Towards Tokenized Human Dynamics Representation
Kenneth Li
Xiao Sun
Zhirong Wu
Fangyun Wei
Stephen Lin
11
2
0
22 Nov 2021
Transformers for prompt-level EMA non-response prediction
Transformers for prompt-level EMA non-response prediction
Supriya Nagesh
Alexander Moreno
Stephanie M Carpenter
Jamie Yap
Soujanya Chatterjee
...
Santosh Kumar
Cho Lam
D. Wetter
Inbal Nahum-Shani
James M. Rehg
12
0
0
01 Nov 2021
Object-Region Video Transformers
Object-Region Video Transformers
Roei Herzig
Elad Ben-Avraham
K. Mangalam
Amir Bar
Gal Chechik
Anna Rohrbach
Trevor Darrell
Amir Globerson
ViT
17
82
0
13 Oct 2021
Context-LGM: Leveraging Object-Context Relation for Context-Aware Object
  Recognition
Context-LGM: Leveraging Object-Context Relation for Context-Aware Object Recognition
Mingzhou Liu
Xinwei Sun
Fandong Zhang
Yizhou Yu
Yizhou Wang
24
0
0
08 Oct 2021
Deep Learning-based Action Detection in Untrimmed Videos: A Survey
Deep Learning-based Action Detection in Untrimmed Videos: A Survey
Elahe Vahdani
Yingli Tian
38
60
0
30 Sep 2021
Hierarchical Multimodal Transformer to Summarize Videos
Hierarchical Multimodal Transformer to Summarize Videos
Bin Zhao
Maoguo Gong
Xuelong Li
ViT
19
55
0
22 Sep 2021
Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose
  Estimation
Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation
Ziniu Wan
Zhengjia Li
Maoqing Tian
Jianbo Liu
Shuai Yi
Hongsheng Li
3DH
27
80
0
06 Sep 2021
MM-ViT: Multi-Modal Video Transformer for Compressed Video Action
  Recognition
MM-ViT: Multi-Modal Video Transformer for Compressed Video Action Recognition
Jiawei Chen
C. Ho
ViT
24
76
0
20 Aug 2021
End-to-End Dense Video Captioning with Parallel Decoding
End-to-End Dense Video Captioning with Parallel Decoding
Teng Wang
Ruimao Zhang
Zhichao Lu
Feng Zheng
Ran Cheng
Ping Luo
3DV
38
179
0
17 Aug 2021
TVT: Transferable Vision Transformer for Unsupervised Domain Adaptation
TVT: Transferable Vision Transformer for Unsupervised Domain Adaptation
Jinyu Yang
Jingjing Liu
N. Xu
Junzhou Huang
20
125
0
12 Aug 2021
Video Transformer for Deepfake Detection with Incremental Learning
Video Transformer for Deepfake Detection with Incremental Learning
Sohail Ahmed Khan
Hang Dai
ViT
10
62
0
11 Aug 2021
Learning Fair Face Representation With Progressive Cross Transformer
Learning Fair Face Representation With Progressive Cross Transformer
Yong Li
Yufei Sun
Zhen Cui
Shiguang Shan
Jian Yang
16
12
0
11 Aug 2021
UniCon: Unified Context Network for Robust Active Speaker Detection
UniCon: Unified Context Network for Robust Active Speaker Detection
Yuanhang Zhang
Susan Liang
Shuang Yang
Xiao-Chang Liu
Zhongqin Wu
Shiguang Shan
Xilin Chen
CVBM
18
36
0
05 Aug 2021
Proposal-based Few-shot Sound Event Detection for Speech and
  Environmental Sounds with Perceivers
Proposal-based Few-shot Sound Event Detection for Speech and Environmental Sounds with Perceivers
Piper Wolters
Logan Sizemore
Chris Daw
Brian Hutchinson
Lauren A. Phillips
27
11
0
28 Jul 2021
Spatial-Temporal Transformer for Dynamic Scene Graph Generation
Spatial-Temporal Transformer for Dynamic Scene Graph Generation
Yuren Cong
Wentong Liao
H. Ackermann
Bodo Rosenhahn
M. Yang
ViT
11
122
0
26 Jul 2021
EAN: Event Adaptive Network for Enhanced Action Recognition
EAN: Event Adaptive Network for Enhanced Action Recognition
Yuan Tian
Yichao Yan
Guangtao Zhai
G. Guo
Zhiyong Gao
27
41
0
22 Jul 2021
Is attention to bounding boxes all you need for pedestrian action
  prediction?
Is attention to bounding boxes all you need for pedestrian action prediction?
Lina Achaji
Julien Moreau
Thibault Fouqueray
François Aioun
François Charpillet
18
30
0
16 Jul 2021
A Generative Model for Raw Audio Using Transformer Architectures
A Generative Model for Raw Audio Using Transformer Architectures
Prateek Verma
C. Chafe
8
28
0
30 Jun 2021
Spatio-Temporal Context for Action Detection
Spatio-Temporal Context for Action Detection
Manuel Sarmiento Calderó
David Varas
Elisenda Bou
19
2
0
29 Jun 2021
Towards Long-Form Video Understanding
Towards Long-Form Video Understanding
Chaoxia Wu
Philipp Krahenbuhl
VLM
ViT
36
165
0
21 Jun 2021
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
Michael S. Ryoo
A. Piergiovanni
Anurag Arnab
Mostafa Dehghani
A. Angelova
ViT
21
127
0
21 Jun 2021
VPN++: Rethinking Video-Pose embeddings for understanding Activities of
  Daily Living
VPN++: Rethinking Video-Pose embeddings for understanding Activities of Daily Living
Srijan Das
Rui Dai
Di Yang
F. Brémond
ViT
36
66
0
17 May 2021
MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized
  Sports Actions
MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions
Yixuan Li
Lei Chen
Runyu He
Zhenzhi Wang
Gangshan Wu
Limin Wang
19
97
0
16 May 2021
Episodic Transformer for Vision-and-Language Navigation
Episodic Transformer for Vision-and-Language Navigation
Alexander Pashevich
Cordelia Schmid
Chen Sun
LM&Ro
24
193
0
13 May 2021
TransHash: Transformer-based Hamming Hashing for Efficient Image
  Retrieval
TransHash: Transformer-based Hamming Hashing for Efficient Image Retrieval
Yongbiao Chen
Shenmin Zhang
Fangxin Liu
Zhigang Chang
Mang Ye
Zhengwei Qi Shanghai Jiao Tong University
ViT
27
48
0
05 May 2021
Multiscale Vision Transformers
Multiscale Vision Transformers
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
ViT
19
1,221
0
22 Apr 2021
Escaping the Big Data Paradigm with Compact Transformers
Escaping the Big Data Paradigm with Compact Transformers
Ali Hassani
Steven Walton
Nikhil Shah
Abulikemu Abuduweili
Jiachen Li
Humphrey Shi
54
462
0
12 Apr 2021
ViViT: A Video Vision Transformer
ViViT: A Video Vision Transformer
Anurag Arnab
Mostafa Dehghani
G. Heigold
Chen Sun
Mario Lucic
Cordelia Schmid
ViT
30
2,086
0
29 Mar 2021
Space-Time Crop & Attend: Improving Cross-modal Video Representation
  Learning
Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning
Mandela Patrick
Yuki M. Asano
Bernie Huang
Ishan Misra
Florian Metze
Joao Henriques
Andrea Vedaldi
AI4TS
16
33
0
18 Mar 2021
Enhancing Transformer for Video Understanding Using Gated Multi-Level
  Attention and Temporal Adversarial Training
Enhancing Transformer for Video Understanding Using Gated Multi-Level Attention and Temporal Adversarial Training
Saurabh Sahu
Palash Goyal
ViT
27
2
0
18 Mar 2021
TransFG: A Transformer Architecture for Fine-grained Recognition
TransFG: A Transformer Architecture for Fine-grained Recognition
Ju He
Jieneng Chen
Shuai Liu
Adam Kortylewski
Cheng Yang
Yutong Bai
Changhu Wang
ViT
33
375
0
14 Mar 2021
Continuous 3D Multi-Channel Sign Language Production via Progressive
  Transformers and Mixture Density Networks
Continuous 3D Multi-Channel Sign Language Production via Progressive Transformers and Mixture Density Networks
Ben Saunders
Necati Cihan Camgöz
Richard Bowden
SLR
22
77
0
11 Mar 2021
Perceiver: General Perception with Iterative Attention
Perceiver: General Perception with Iterative Attention
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
VLM
ViT
MDE
48
973
0
04 Mar 2021
Deep Deformation Detail Synthesis for Thin Shell Models
Deep Deformation Detail Synthesis for Thin Shell Models
Lan Chen
Lin Gao
Jie Yang
Shibiao Xu
Juntao Ye
Xiaopeng Zhang
Yu-Kun Lai
3DH
AI4CE
22
11
0
23 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
280
1,981
0
09 Feb 2021
Mind the Gap: Assessing Temporal Generalization in Neural Language
  Models
Mind the Gap: Assessing Temporal Generalization in Neural Language Models
Angeliki Lazaridou
A. Kuncoro
E. Gribovskaya
Devang Agrawal
Adam Liska
...
Sebastian Ruder
Dani Yogatama
Kris Cao
Susannah Young
Phil Blunsom
VLM
30
207
0
03 Feb 2021
U-LanD: Uncertainty-Driven Video Landmark Detection
U-LanD: Uncertainty-Driven Video Landmark Detection
Mohammad Jafari
C. Luong
Michael Y. Tsang
A. Gu
N. V. Woudenberg
R. Rohling
T. Tsang
Purang Abolmaesumi
22
12
0
02 Feb 2021
Coarse Temporal Attention Network (CTA-Net) for Driver's Activity
  Recognition
Coarse Temporal Attention Network (CTA-Net) for Driver's Activity Recognition
Zachary Wharton
Ardhendu Behera
Yonghuai Liu
Nikolaos Bessis
39
35
0
17 Jan 2021
Transformers in Vision: A Survey
Transformers in Vision: A Survey
Salman Khan
Muzammal Naseer
Munawar Hayat
Syed Waqas Zamir
F. Khan
M. Shah
ViT
227
2,428
0
04 Jan 2021
TransTrack: Multiple Object Tracking with Transformer
TransTrack: Multiple Object Tracking with Transformer
Pei Sun
Jinkun Cao
Yi-Xin Jiang
Rufeng Zhang
Enze Xie
Zehuan Yuan
Changhu Wang
Ping Luo
ViT
VOT
243
565
0
31 Dec 2020
Previous
123
Next