ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1711.11248
  4. Cited By
A Closer Look at Spatiotemporal Convolutions for Action Recognition

A Closer Look at Spatiotemporal Convolutions for Action Recognition

30 November 2017
Du Tran
Heng Wang
Lorenzo Torresani
Jamie Ray
Yann LeCun
Manohar Paluri
ArXivPDFHTML

Papers citing "A Closer Look at Spatiotemporal Convolutions for Action Recognition"

50 / 1,270 papers shown
Title
Cross-Modal Attention Consistency for Video-Audio Unsupervised Learning
Cross-Modal Attention Consistency for Video-Audio Unsupervised Learning
Shaobo Min
Qi Dai
Hongtao Xie
Chuang Gan
Yongdong Zhang
Jingdong Wang
SSL
23
7
0
13 Jun 2021
Multi-level Attention Fusion Network for Audio-visual Event Recognition
Multi-level Attention Fusion Network for Audio-visual Event Recognition
Mathilde Brousmiche
Jean Rouat
Stéphane Dupont
27
11
0
12 Jun 2021
Space-time Mixing Attention for Video Transformer
Space-time Mixing Attention for Video Transformer
Adrian Bulat
Juan-Manuel Perez-Rua
Swathikiran Sudhakaran
Brais Martínez
Georgios Tzimiropoulos
ViT
36
124
0
10 Jun 2021
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Mandela Patrick
Dylan Campbell
Yuki M. Asano
Ishan Misra
Ishan Misra Florian Metze
Christoph Feichtenhofer
Andrea Vedaldi
João F. Henriques
30
274
0
09 Jun 2021
Tracking by Joint Local and Global Search: A Target-aware Attention
  based Approach
Tracking by Joint Local and Global Search: A Target-aware Attention based Approach
Tianlin Li
Jin Tang
Bin Luo
Yaowei Wang
Yonghong Tian
Feng Wu
30
29
0
09 Jun 2021
Neural Speaker Embeddings for Ultrasound-based Silent Speech Interfaces
Neural Speaker Embeddings for Ultrasound-based Silent Speech Interfaces
Amin Honarmandi Shandiz
L. Tóth
G. Gosztolya
Alexandra Markó
Tamás Gábor Csapó
31
6
0
08 Jun 2021
How to Design a Three-Stage Architecture for Audio-Visual Active Speaker
  Detection in the Wild
How to Design a Three-Stage Architecture for Audio-Visual Active Speaker Detection in the Wild
Okan Kopuklu
Maja Taseska
Gerhard Rigoll
3DV
29
45
0
07 Jun 2021
Video Instance Segmentation using Inter-Frame Communication Transformers
Video Instance Segmentation using Inter-Frame Communication Transformers
Sukjun Hwang
Miran Heo
Seoung Wug Oh
Seon Joo Kim
ViT
33
135
0
07 Jun 2021
Transformed ROIs for Capturing Visual Transformations in Videos
Transformed ROIs for Capturing Visual Transformations in Videos
Abhinav Rai
Fadime Sener
Angela Yao
ViT
24
3
0
06 Jun 2021
Signal Transformer: Complex-valued Attention and Meta-Learning for
  Signal Recognition
Signal Transformer: Complex-valued Attention and Meta-Learning for Signal Recognition
Yihong Dong
Ying Peng
Muqiao Yang
Songtao Lu
Qingjiang Shi
49
9
0
05 Jun 2021
ASCNet: Self-supervised Video Representation Learning with
  Appearance-Speed Consistency
ASCNet: Self-supervised Video Representation Learning with Appearance-Speed Consistency
Deng Huang
Wenhao Wu
Weiwen Hu
Xu Liu
Dongliang He
Zhihua Wu
Xiangmiao Wu
Ming Tan
Errui Ding
SSL
21
55
0
04 Jun 2021
Anticipative Video Transformer
Anticipative Video Transformer
Rohit Girdhar
Kristen Grauman
ViT
27
209
0
03 Jun 2021
CT-Net: Channel Tensorization Network for Video Classification
CT-Net: Channel Tensorization Network for Video Classification
Kunchang Li
Xianhang Li
Yali Wang
Jun Wang
Yu Qiao
ViT
30
55
0
03 Jun 2021
TSI: Temporal Saliency Integration for Video Action Recognition
TSI: Temporal Saliency Integration for Video Action Recognition
Haisheng Su
Kunchang Li
Jinyuan Feng
Dongliang Wang
Weihao Gan
Wei Wu
Yu Qiao
29
4
0
02 Jun 2021
Continual 3D Convolutional Neural Networks for Real-time Processing of
  Videos
Continual 3D Convolutional Neural Networks for Real-time Processing of Videos
Lukas Hedegaard
Alexandros Iosifidis
3DPC
25
14
0
31 May 2021
Classification of Brain Tumours in MR Images using Deep Spatiospatial
  Models
Classification of Brain Tumours in MR Images using Deep Spatiospatial Models
S. Chatterjee
F. Nizamani
A. Nürnberger
Oliver Speck
24
79
0
28 May 2021
Unsupervised Action Segmentation by Joint Representation Learning and
  Online Clustering
Unsupervised Action Segmentation by Joint Representation Learning and Online Clustering
Sateesh Kumar
S. Haresh
Awais Ahmed
Andrey Konin
M. Zia
Quoc-Huy Tran
SSL
27
47
0
27 May 2021
Tracking Without Re-recognition in Humans and Machines
Tracking Without Re-recognition in Humans and Machines
Drew Linsley
Girik Malik
Junkyung Kim
L. Govindarajan
E. Mingolla
Thomas Serre
31
18
0
27 May 2021
SSAN: Separable Self-Attention Network for Video Representation Learning
SSAN: Separable Self-Attention Network for Video Representation Learning
Xudong Guo
Xun Guo
Yan Lu
ViT
AI4TS
22
26
0
27 May 2021
DSANet: Dynamic Segment Aggregation Network for Video-Level
  Representation Learning
DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning
Wenhao Wu
Yuxiang Zhao
Yanwu Xu
Xiao Tan
Dongliang He
...
Jinxing Ye
Yingying Li
Mingde Yao
Zichao Dong
Yifeng Shi
AI4TS
30
27
0
25 May 2021
Temporal Action Proposal Generation with Transformers
Temporal Action Proposal Generation with Transformers
Lining Wang
Haosen Yang
Wenhao Wu
Huanjin Yao
Hujie Huang
ViT
38
27
0
25 May 2021
Puck localization and multi-task event recognition in broadcast hockey
  videos
Puck localization and multi-task event recognition in broadcast hockey videos
Kanav Vats
M. Fani
David A Clausi
John S. Zelek
15
12
0
21 May 2021
MutualNet: Adaptive ConvNet via Mutual Learning from Different Model
  Configurations
MutualNet: Adaptive ConvNet via Mutual Learning from Different Model Configurations
Taojiannan Yang
Sijie Zhu
Matías Mendieta
Pu Wang
Ravikumar Balakrishnan
Minwoo Lee
T. Han
M. Shah
Chong Chen
3DH
OOD
30
23
0
14 May 2021
Collaborative Spatial-Temporal Modeling for Language-Queried Video Actor
  Segmentation
Collaborative Spatial-Temporal Modeling for Language-Queried Video Actor Segmentation
Tianrui Hui
Shaofei Huang
Si Liu
Zihan Ding
Guanbin Li
Wenguan Wang
Jizhong Han
Fei Wang
20
46
0
14 May 2021
REGINA - Reasoning Graph Convolutional Networks in Human Action
  Recognition
REGINA - Reasoning Graph Convolutional Networks in Human Action Recognition
Bruno Degardin
Vasco Lopes
Hugo Proencca
3DH
GNN
38
10
0
14 May 2021
Event-LSTM: An Unsupervised and Asynchronous Learning-based
  Representation for Event-based Data
Event-LSTM: An Unsupervised and Asynchronous Learning-based Representation for Event-based Data
Lakshmi Annamalai
Vignesh Ramanathan
Chetan Singh Thakur
21
15
0
10 May 2021
Adaptive Focus for Efficient Video Recognition
Adaptive Focus for Efficient Video Recognition
Yulin Wang
Zhaoxi Chen
Haojun Jiang
Shiji Song
Yizeng Han
Gao Huang
45
98
0
07 May 2021
Unsupervised Visual Representation Learning by Tracking Patches in Video
Unsupervised Visual Representation Learning by Tracking Patches in Video
Guangting Wang
Yizhou Zhou
Chong Luo
Wenxuan Xie
Wenjun Zeng
Zhiwei Xiong
SSL
37
24
0
06 May 2021
Audio Retrieval with Natural Language Queries
Audio Retrieval with Natural Language Queries
Andreea-Maria Oncescu
A. Sophia Koepke
João F. Henriques
Zeynep Akata
Samuel Albanie
21
77
0
05 May 2021
Motion-Augmented Self-Training for Video Recognition at Smaller Scale
Motion-Augmented Self-Training for Video Recognition at Smaller Scale
Kirill Gavrilyuk
Mihir Jain
I. Karmanov
Cees G. M. Snoek
18
21
0
04 May 2021
A Large-Scale Study on Unsupervised Spatiotemporal Representation
  Learning
A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning
Christoph Feichtenhofer
Haoqi Fan
Bo Xiong
Ross B. Girshick
Kaiming He
SSL
AI4TS
39
257
0
29 Apr 2021
Relevance Detection in Cataract Surgery Videos by Spatio-Temporal Action
  Localization
Relevance Detection in Cataract Surgery Videos by Spatio-Temporal Action Localization
Negin Ghamsarian
M. Taschwer
Doris Putzgruber-Adamitsch
Stephanie Sarny
Klaus Schoeffmann
19
15
0
29 Apr 2021
Revisiting Skeleton-based Action Recognition
Revisiting Skeleton-based Action Recognition
Haodong Duan
Yue Zhao
Kai-xiang Chen
Dahua Lin
Bo Dai
3DH
35
486
0
28 Apr 2021
FrameExit: Conditional Early Exiting for Efficient Video Recognition
FrameExit: Conditional Early Exiting for Efficient Video Recognition
Amir Ghodrati
B. Bejnordi
A. Habibian
45
81
0
27 Apr 2021
Three-stream network for enriched Action Recognition
Three-stream network for enriched Action Recognition
Ivaxi Sheth
24
4
0
27 Apr 2021
Joint Representation Learning and Novel Category Discovery on Single-
  and Multi-modal Data
Joint Representation Learning and Novel Category Discovery on Single- and Multi-modal Data
Xu Jia
Kai Han
Yukun Zhu
Bradley Green
159
57
0
26 Apr 2021
VidTr: Video Transformer Without Convolutions
VidTr: Video Transformer Without Convolutions
Yanyi Zhang
Xinyu Li
Chunhui Liu
Bing Shuai
Yi Zhu
Biagio Brattoli
Hao Chen
I. Marsic
Joseph Tighe
ViT
148
193
0
23 Apr 2021
Exploring Modality-shared Appearance Features and Modality-invariant
  Relation Features for Cross-modality Person Re-Identification
Exploring Modality-shared Appearance Features and Modality-invariant Relation Features for Cross-modality Person Re-Identification
Nianchang Huang
Jianan Liu
Qiang Zhang
Jungong Han
28
40
0
23 Apr 2021
3D Convolutional Neural Networks for Ultrasound-Based Silent Speech
  Interfaces
3D Convolutional Neural Networks for Ultrasound-Based Silent Speech Interfaces
L. Tóth
Amin Honarmandi Shandiz
28
11
0
23 Apr 2021
Skip-Convolutions for Efficient Video Processing
Skip-Convolutions for Efficient Video Processing
A. Habibian
Davide Abati
Taco S. Cohen
B. Bejnordi
54
50
0
23 Apr 2021
SportsCap: Monocular 3D Human Motion Capture and Fine-grained
  Understanding in Challenging Sports Videos
SportsCap: Monocular 3D Human Motion Capture and Fine-grained Understanding in Challenging Sports Videos
Xin Chen
Anqi Pang
Wei Yang
Yuexin Ma
Lan Xu
Jingyi Yu
149
56
0
23 Apr 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw
  Video, Audio and Text
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Huayu Chen
Boqing Gong
ViT
251
577
0
22 Apr 2021
Distilling Audio-Visual Knowledge by Compositional Contrastive Learning
Distilling Audio-Visual Knowledge by Compositional Contrastive Learning
Yanbei Chen
Yongqin Xian
A. Sophia Koepke
Ying Shan
Zeynep Akata
80
82
0
22 Apr 2021
Skimming and Scanning for Untrimmed Video Action Recognition
Skimming and Scanning for Untrimmed Video Action Recognition
Yunyan Hong
Ailing Zeng
Min Li
Cewu Lu
Li Jiang
Qiang Xu
27
0
0
21 Apr 2021
Detection of Audio-Video Synchronization Errors Via Event Detection
Detection of Audio-Video Synchronization Errors Via Event Detection
Joshua Peter Ebenezer
Yongjun Wu
Hai Wei
S. Sethuraman
Z. Liu
37
12
0
20 Apr 2021
MGSampler: An Explainable Sampling Strategy for Video Action Recognition
MGSampler: An Explainable Sampling Strategy for Video Action Recognition
Yuan Zhi
Zhan Tong
Limin Wang
Gangshan Wu
TTA
19
72
0
20 Apr 2021
A cappella: Audio-visual Singing Voice Separation
A cappella: Audio-visual Singing Voice Separation
Juan F. Montesinos
V. S. Kadandale
G. Haro
40
16
0
20 Apr 2021
Data-driven vehicle speed detection from synthetic driving simulator
  images
Data-driven vehicle speed detection from synthetic driving simulator images
A. Martínez
Javier Díaz
Iván García Daza
David Fernández Llorca
20
6
0
20 Apr 2021
HCMS: Hierarchical and Conditional Modality Selection for Efficient
  Video Recognition
HCMS: Hierarchical and Conditional Modality Selection for Efficient Video Recognition
Zejia Weng
Zuxuan Wu
Hengduo Li
Jingjing Chen
Yu-Gang Jiang
34
4
0
20 Apr 2021
Comparing Correspondences: Video Prediction with Correspondence-wise
  Losses
Comparing Correspondences: Video Prediction with Correspondence-wise Losses
Daniel Geng
Max Hamilton
Andrew Owens
3DH
32
16
0
19 Apr 2021
Previous
123...161718...242526
Next