ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1711.11248
  4. Cited By
A Closer Look at Spatiotemporal Convolutions for Action Recognition

A Closer Look at Spatiotemporal Convolutions for Action Recognition

30 November 2017
Du Tran
Heng Wang
Lorenzo Torresani
Jamie Ray
Yann LeCun
Manohar Paluri
ArXivPDFHTML

Papers citing "A Closer Look at Spatiotemporal Convolutions for Action Recognition"

50 / 1,270 papers shown
Title
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video
  Learning
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
A. Piergiovanni
Weicheng Kuo
A. Angelova
ViT
38
54
0
06 Dec 2022
VLG: General Video Recognition with Web Textual Knowledge
VLG: General Video Recognition with Web Textual Knowledge
Jintao Lin
Zhaoyang Liu
Wenhai Wang
Wayne Wu
Limin Wang
39
0
0
03 Dec 2022
Normalized Contrastive Learning for Text-Video Retrieval
Normalized Contrastive Learning for Text-Video Retrieval
Yookoon Park
Mahmoud Azab
Bo Xiong
Seungwhan Moon
Florian Metze
Gourab Kundu
Kirmani Ahmed
30
11
0
30 Nov 2022
Re^2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal
  Action Localization
Re^2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization
Chen Zhao
Shuming Liu
K. Mangalam
Guohao Li
38
17
0
25 Nov 2022
Towards Good Practices for Missing Modality Robust Action Recognition
Towards Good Practices for Missing Modality Robust Action Recognition
Sangmin Woo
Sumin Lee
Yeonju Park
Muhammad Adi Nugroho
Changick Kim
22
43
0
25 Nov 2022
Hand Guided High Resolution Feature Enhancement for Fine-Grained Atomic
  Action Segmentation within Complex Human Assemblies
Hand Guided High Resolution Feature Enhancement for Fine-Grained Atomic Action Segmentation within Complex Human Assemblies
Matthew Kent Myers
Nick Wright
Stephen McGough
Nicholas Martin
14
1
0
24 Nov 2022
Mitigating and Evaluating Static Bias of Action Representations in the
  Background and the Foreground
Mitigating and Evaluating Static Bias of Action Representations in the Background and the Foreground
Haoxin Li
Yuan Liu
Hanwang Zhang
Boyang Li
30
15
0
23 Nov 2022
Tell Me What Happened: Unifying Text-guided Video Completion via
  Multimodal Masked Video Generation
Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation
Tsu-jui Fu
Licheng Yu
Ning Zhang
Cheng-Yang Fu
Jong-Chyi Su
William Yang Wang
Sean Bell
VGen
61
37
0
23 Nov 2022
Dynamic Appearance: A Video Representation for Action Recognition with
  Joint Training
Dynamic Appearance: A Video Representation for Action Recognition with Joint Training
Guoxi Huang
A. Bors
27
1
0
23 Nov 2022
Data Leakage and Evaluation Issues in Micro-Expression Analysis
Data Leakage and Evaluation Issues in Micro-Expression Analysis
Tuomas Varanka
Yante Li
Wei Peng
Guoying Zhao
AAML
31
5
0
21 Nov 2022
MagicVideo: Efficient Video Generation With Latent Diffusion Models
MagicVideo: Efficient Video Generation With Latent Diffusion Models
Daquan Zhou
Weimin Wang
Hanshu Yan
Weiwei Lv
Yizhe Zhu
Jiashi Feng
DiffM
VGen
41
373
0
20 Nov 2022
HARDVS: Revisiting Human Activity Recognition with Dynamic Vision
  Sensors
HARDVS: Revisiting Human Activity Recognition with Dynamic Vision Sensors
Tianlin Li
Zong-Yao Wu
Bowei Jiang
Zhimin Bao
Lin Zhu
Guoqiu Li
Yaowei Wang
Yonghong Tian
31
36
0
17 Nov 2022
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video
  UniFormer
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
Kunchang Li
Yali Wang
Yinan He
Yizhuo Li
Yi Wang
Limin Wang
Yu Qiao
ViT
30
107
0
17 Nov 2022
InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges
InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges
Guo Chen
Sen Xing
Zhe Chen
Yi Wang
Kunchang Li
...
Hongjie Zhang
Tong Lu
Yali Wang
Liming Wang
Yu Qiao
41
46
0
17 Nov 2022
A Unified Multimodal De- and Re-coupling Framework for RGB-D Motion
  Recognition
A Unified Multimodal De- and Re-coupling Framework for RGB-D Motion Recognition
Benjia Zhou
Pichao Wang
Jun Wan
Yan-Ni Liang
Fan Wang
37
17
0
16 Nov 2022
Exploring State Change Capture of Heterogeneous Backbones @ Ego4D Hands
  and Objects Challenge 2022
Exploring State Change Capture of Heterogeneous Backbones @ Ego4D Hands and Objects Challenge 2022
Yin-Dong Zheng
Guo Chen
Jiahao Wang
Tong Lu
Liming Wang
45
0
0
16 Nov 2022
Dynamic Temporal Filtering in Video Models
Dynamic Temporal Filtering in Video Models
Fuchen Long
Zhaofan Qiu
Yingwei Pan
Ting Yao
Chong-Wah Ngo
Tao Mei
AI4TS
32
17
0
15 Nov 2022
Learning to Model Multimodal Semantic Alignment for Story Visualization
Learning to Model Multimodal Semantic Alignment for Story Visualization
Bowen Li
Thomas Lukasiewicz
DiffM
31
2
0
14 Nov 2022
Learning from partially labeled data for multi-organ and tumor
  segmentation
Learning from partially labeled data for multi-organ and tumor segmentation
Yutong Xie
Jianpeng Zhang
Yong-quan Xia
Chunhua Shen
MedIm
ViT
39
18
0
13 Nov 2022
Deep Unsupervised Key Frame Extraction for Efficient Video
  Classification
Deep Unsupervised Key Frame Extraction for Efficient Video Classification
Hao Tang
L. Ding
Songsong Wu
Bin Ren
N. Sebe
Paolo Rota
22
27
0
12 Nov 2022
MARLIN: Masked Autoencoder for facial video Representation LearnINg
MARLIN: Masked Autoencoder for facial video Representation LearnINg
Zhixi Cai
Shreya Ghosh
Kalin Stefanov
Abhinav Dhall
Jianfei Cai
Hamid Rezatofighi
Reza Haffari
Munawar Hayat
ViT
CVBM
27
60
0
12 Nov 2022
Soft-Landing Strategy for Alleviating the Task Discrepancy Problem in
  Temporal Action Localization Tasks
Soft-Landing Strategy for Alleviating the Task Discrepancy Problem in Temporal Action Localization Tasks
Hyolim Kang
Hanjung Kim
Joungbin An
Minsu Cho
Seon Joo Kim
38
5
0
11 Nov 2022
Two-stream Multi-dimensional Convolutional Network for Real-time
  Violence Detection
Two-stream Multi-dimensional Convolutional Network for Real-time Violence Detection
Diponkar Ghosh
Amitabha Chakrabarty
19
4
0
08 Nov 2022
Eat-Radar: Continuous Fine-Grained Intake Gesture Detection Using FMCW
  Radar and 3D Temporal Convolutional Network with Attention
Eat-Radar: Continuous Fine-Grained Intake Gesture Detection Using FMCW Radar and 3D Temporal Convolutional Network with Attention
C. Wang
T. S. Kumar
W. de Raedt
Guido Camps
Hans Hallez
Bart Vanrumste
24
12
0
08 Nov 2022
Egocentric Audio-Visual Noise Suppression
Egocentric Audio-Visual Noise Suppression
Roshan S. Sharma
Weipeng He
Ju Lin
Egor Lakomkin
Yang Liu
Kaustubh Kalgaonkar
EgoV
24
1
0
07 Nov 2022
Learning a Condensed Frame for Memory-Efficient Video Class-Incremental
  Learning
Learning a Condensed Frame for Memory-Efficient Video Class-Incremental Learning
Yixuan Pei
Zhiwu Qing
Jun Cen
Xiang Wang
Shiwei Zhang
Yaxiong Wang
Mingqian Tang
Nong Sang
Xueming Qian
27
13
0
02 Nov 2022
No-audio speaking status detection in crowded settings via visual
  pose-based filtering and wearable acceleration
No-audio speaking status detection in crowded settings via visual pose-based filtering and wearable acceleration
Jose Vargas-Quiros
Laura Cabrera-Quiros
Hayley Hung
29
1
0
01 Nov 2022
GliTr: Glimpse Transformers with Spatiotemporal Consistency for Online
  Action Prediction
GliTr: Glimpse Transformers with Spatiotemporal Consistency for Online Action Prediction
Samrudhdhi B. Rangrej
Kevin J Liang
Tal Hassner
James J. Clark
27
3
0
24 Oct 2022
OLLA: Optimizing the Lifetime and Location of Arrays to Reduce the
  Memory Usage of Neural Networks
OLLA: Optimizing the Lifetime and Location of Arrays to Reduce the Memory Usage of Neural Networks
Benoit Steiner
Mostafa Elhoushi
Jacob Kahn
James Hegarty
31
8
0
24 Oct 2022
Learning a Grammar Inducer from Massive Uncurated Instructional Videos
Learning a Grammar Inducer from Massive Uncurated Instructional Videos
Songyang Zhang
Linfeng Song
Lifeng Jin
Haitao Mi
Kun Xu
Dong Yu
Jiebo Luo
38
5
0
22 Oct 2022
Cyclical Self-Supervision for Semi-Supervised Ejection Fraction
  Prediction from Echocardiogram Videos
Cyclical Self-Supervision for Semi-Supervised Ejection Fraction Prediction from Echocardiogram Videos
Weihang Dai
Xuelong Li
Xinpeng Ding
Kwang-Ting Cheng
46
22
0
20 Oct 2022
Facial Expression Video Generation Based-On Spatio-temporal
  Convolutional GAN: FEV-GAN
Facial Expression Video Generation Based-On Spatio-temporal Convolutional GAN: FEV-GAN
Hamza Bouzid
Lahoucine Ballihi
CVBM
30
9
0
20 Oct 2022
MovieCLIP: Visual Scene Recognition in Movies
MovieCLIP: Visual Scene Recognition in Movies
Digbalay Bose
Rajat Hebbar
Krishna Somandepalli
Haoyang Zhang
Huayu Chen
K. Cole-McLaughlin
Haoran Wang
Shrikanth Narayanan
CLIP
22
21
0
20 Oct 2022
How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios
How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios
Mantas Mazeika
Eric Tang
Andy Zou
Steven Basart
Jun Shern Chan
Dawn Song
David A. Forsyth
Jacob Steinhardt
Dan Hendrycks
50
8
0
18 Oct 2022
MaSS: Multi-attribute Selective Suppression
MaSS: Multi-attribute Selective Suppression
Chun-Fu Chen
Shaohan Hu
Zhong-Zhi Shi
Prateek Gulati
Bill Moriarty
Marco Pistoia
Vincenzo Piuri
P. Samarati
CVBM
21
4
0
18 Oct 2022
ATCON: Attention Consistency for Vision Models
ATCON: Attention Consistency for Vision Models
Ali Mirzazadeh
Florian Dubost
M. Pike
Krish Maniar
Max Zuo
Christopher Lee-Messer
D. Rubin
13
1
0
18 Oct 2022
Real-Time Driver Monitoring Systems through Modality and View Analysis
Real-Time Driver Monitoring Systems through Modality and View Analysis
Yiming Ma
Victor Sanchez
S. Nikan
Devesh Upadhyay
Bhushan Atote
T. Guha
22
6
0
17 Oct 2022
Video in 10 Bits: Few-Bit VideoQA for Efficiency and Privacy
Video in 10 Bits: Few-Bit VideoQA for Efficiency and Privacy
Shiyuan Huang
Robinson Piramuthu
Shih-Fu Chang
Gunnar A. Sigurdsson
28
1
0
15 Oct 2022
Linear Video Transformer with Feature Fixation
Linear Video Transformer with Feature Fixation
Kaiyue Lu
Zexia Liu
Jianyuan Wang
Weixuan Sun
Zhen Qin
...
Xuyang Shen
Huizhong Deng
Xiaodong Han
Yuchao Dai
Yiran Zhong
35
4
0
15 Oct 2022
STAR-Transformer: A Spatio-temporal Cross Attention Transformer for
  Human Action Recognition
STAR-Transformer: A Spatio-temporal Cross Attention Transformer for Human Action Recognition
Dasom Ahn
Sangwon Kim
H. Hong
ByoungChul Ko
ViT
31
97
0
14 Oct 2022
S4ND: Modeling Images and Videos as Multidimensional Signals Using State
  Spaces
S4ND: Modeling Images and Videos as Multidimensional Signals Using State Spaces
Eric N. D. Nguyen
Karan Goel
Albert Gu
Gordon W. Downs
Preey Shah
Tri Dao
S. Baccus
Christopher Ré
VLM
22
39
0
12 Oct 2022
DG-STGCN: Dynamic Spatial-Temporal Modeling for Skeleton-based Action
  Recognition
DG-STGCN: Dynamic Spatial-Temporal Modeling for Skeleton-based Action Recognition
Haodong Duan
Jiaqi Wang
Kai-xiang Chen
Dahua Lin
38
42
0
12 Oct 2022
Match Cutting: Finding Cuts with Smooth Visual Transitions
Match Cutting: Finding Cuts with Smooth Visual Transitions
Boris Chen
Amir Ziai
Rebecca Tucker
Yuchen Xie
VGen
28
14
0
11 Oct 2022
It Takes Two: Masked Appearance-Motion Modeling for Self-supervised
  Video Transformer Pre-training
It Takes Two: Masked Appearance-Motion Modeling for Self-supervised Video Transformer Pre-training
Yuxin Song
Min Yang
Wenhao Wu
Dongliang He
Fu Li
Jingdong Wang
ViT
103
8
0
11 Oct 2022
SimPer: Simple Self-Supervised Learning of Periodic Targets
SimPer: Simple Self-Supervised Learning of Periodic Targets
Yuzhe Yang
Xin Liu
Jiang Wu
Silviu Borac
Dina Katabi
M. Poh
Daniel J. McDuff
41
45
0
06 Oct 2022
What Should the System Do Next?: Operative Action Captioning for
  Estimating System Actions
What Should the System Do Next?: Operative Action Captioning for Estimating System Actions
Taiki Nakamura
Seiya Kawano
Akishige Yuguchi
Yasutomo Kawanishi
Koichiro Yoshino
14
0
0
06 Oct 2022
A case study of spatiotemporal forecasting techniques for weather
  forecasting
A case study of spatiotemporal forecasting techniques for weather forecasting
Shakir Showkat Sofi
Ivan Oseledets
35
4
0
29 Sep 2022
Streaming Video Temporal Action Segmentation In Real Time
Streaming Video Temporal Action Segmentation In Real Time
Wu Wen
Yunheng Li
Zhuben Dong
Lin Feng
Wanxiao Yang
Shenlan Liu
33
3
0
28 Sep 2022
Learning State-Aware Visual Representations from Audible Interactions
Learning State-Aware Visual Representations from Audible Interactions
Himangi Mittal
Pedro Morgado
Unnat Jain
Abhinav Gupta
78
23
0
27 Sep 2022
AdaFocusV3: On Unified Spatial-temporal Dynamic Video Recognition
AdaFocusV3: On Unified Spatial-temporal Dynamic Video Recognition
Yulin Wang
Yang Yue
Xin-Wen Xu
Ali Hassani
V. Kulikov
Nikita Orlov
S. Song
Humphrey Shi
Gao Huang
32
17
0
27 Sep 2022
Previous
123...8910...242526
Next