Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1711.11248
Cited By
A Closer Look at Spatiotemporal Convolutions for Action Recognition
30 November 2017
Du Tran
Heng Wang
Lorenzo Torresani
Jamie Ray
Yann LeCun
Manohar Paluri
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Closer Look at Spatiotemporal Convolutions for Action Recognition"
50 / 1,270 papers shown
Title
Multi-modal Collaborative Optimization and Expansion Network for Event-assisted Single-eye Expression Recognition
Runduo Han
Xiuping Liu
Shangxuan Yi
Yi Zhang
Hongchen Tan
6
0
0
17 May 2025
BandRC: Band Shifted Raised Cosine Activated Implicit Neural Representations
Pandula Thennakoon
Avishka Ranasinghe
Mario De Silva
Buwaneka Epakanda
Roshan Godaliyadda
Parakrama Ekanayake
Vijitha Herath
7
0
0
16 May 2025
UWAV: Uncertainty-weighted Weakly-supervised Audio-Visual Video Parsing
Yung-Hsuan Lai
Janek Ebbers
Yu-Chiang Frank Wang
François Germain
Michael Jeffrey Jones
Moitreya Chatterjee
26
0
0
14 May 2025
DiGIT: Multi-Dilated Gated Encoder and Central-Adjacent Region Integrated Decoder for Temporal Action Detection Transformer
Ho-Joong Kim
Y. E. Lee
Jung-Ho Hong
Seong-Whan Lee
47
0
0
09 May 2025
Action Spotting and Precise Event Detection in Sports: Datasets, Methods, and Challenges
Hao Xu
Arbind Agrahari Baniya
Sam Well
Mohamed Reda Bouadjenek
Richard Dazeley
S. Aryal
AI4TS
29
0
0
06 May 2025
A simple and effective approach for body part recognition on CT scans based on projection estimation
Franko Hrzic
Mohammadreza Movahhedi
Ophelie Lavoie-Gagne
Ata Kiapour
46
0
0
30 Apr 2025
Learning Streaming Video Representation via Multitask Training
Yibin Yan
Jilan Xu
Shangzhe Di
Yikun Liu
Yudi Shi
Qirui Chen
Zeqian Li
Yifei Huang
Weidi Xie
CLL
84
0
0
28 Apr 2025
Hierarchical and Multimodal Data for Daily Activity Understanding
Ghazal Kaviani
Yavuz Yarici
Seulgi Kim
Mohit Prabhushankar
Ghassan AlRegib
Mashhour Solh
Ameya Patil
57
0
0
24 Apr 2025
Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images
Boyang Deng
Songyou Peng
Kyle Genova
Gordon Wetzstein
Noah Snavely
Leonidas J. Guibas
Thomas Funkhouser
HAI
205
0
0
11 Apr 2025
A Large-Scale Analysis on Contextual Self-Supervised Video Representation Learning
Akash Kumar
Ashlesha Kumar
Vibhav Vineet
Yogesh S Rawat
SSL
250
0
0
08 Apr 2025
SEVERE++: Evaluating Benchmark Sensitivity in Generalization of Video Representation Learning
Fida Mohammad Thoker
Letian Jiang
Chen Zhao
Piyush Bagad
Hazel Doughty
Bernard Ghanem
Cees G. M. Snoek
ViT
SSL
52
0
0
08 Apr 2025
GMR-Conv: An Efficient Rotation and Reflection Equivariant Convolution Kernel Using Gaussian Mixture Rings
Yuexi Du
Jiazhen Zhang
Nicha Dvornek
J. Onofrey
AAML
53
0
0
03 Apr 2025
SocialGesture: Delving into Multi-person Gesture Understanding
Xu Cao
Pranav Virupaksha
Wenqi Jia
Bolin Lai
Fiona Ryan
Sangmin Lee
James M. Rehg
SLR
56
0
0
03 Apr 2025
FLAMES: A Hybrid Spiking-State Space Model for Adaptive Memory Retention in Event-Based Learning
Biswadeep Chakraborty
Saibal Mukhopadhyay
51
0
0
02 Apr 2025
Is Temporal Prompting All We Need For Limited Labeled Action Recognition?
Shreyank N. Gowda
Boyan Gao
Xiao Gu
Xiaobo Jin
VLM
41
0
0
02 Apr 2025
FDDet: Frequency-Decoupling for Boundary Refinement in Temporal Action Detection
Xinnan Zhu
Yicheng Zhu
Tixin Chen
Wentao Wu
Yuanjie Dang
49
0
0
01 Apr 2025
CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition
Jongseo Lee
Joohyun Chang
Dongho Lee
Jinwoo Choi
56
0
0
30 Mar 2025
OwlSight: A Robust Illumination Adaptation Framework for Dark Video Human Action Recognition
Shihao Cheng
Jinlu Zhang
Yue Liu
Zhigang Tu
VLM
39
0
0
30 Mar 2025
Mamba-3D as Masked Autoencoders for Accurate and Data-Efficient Analysis of Medical Ultrasound Videos
Jiaheng Zhou
Yanfeng Zhou
Wei Fang
Yuxing Tang
Le Lu
Ge Yang
Mamba
253
0
0
26 Mar 2025
A Spatiotemporal Radar-Based Precipitation Model for Water Level Prediction and Flood Forecasting
Sakshi Dhankhar
Stefan H. A. Wittek
Hamidreza Eivazi
Andreas Rausch
43
0
0
25 Mar 2025
Joint Self-Supervised Video Alignment and Action Segmentation
Ali Shah Ali
Syed Ahmed Mahmood
Mubin Saeed
Andrey Konin
M. Zia
Quoc-Huy Tran
OT
75
0
0
21 Mar 2025
STOP: Integrated Spatial-Temporal Dynamic Prompting for Video Understanding
Zichen Liu
Kunlun Xu
Bing-Huang Su
Xu Zou
Yuxin Peng
Jiahuan Zhou
VLM
AI4TS
71
1
0
20 Mar 2025
Adapting to the Unknown: Training-Free Audio-Visual Event Perception with Dynamic Thresholds
E. Shaar
Ariel Shaulov
Gal Chechik
Lior Wolf
VLM
41
0
0
17 Mar 2025
VideoMAP: Toward Scalable Mamba-based Video Autoregressive Pretraining
Yunze Liu
Peiran Wu
C. Liang
Junxiao Shen
Limin Wang
Li Yi
Mamba
56
0
0
16 Mar 2025
Domain Generalization for Improved Human Activity Recognition in Office Space Videos Using Adaptive Pre-processing
Partho Ghosh
Raisa Bentay Hossain
Mohammad Zunaed
Taufiq Hasan
58
0
0
16 Mar 2025
A Large-Scale Study on Video Action Dataset Condensation
Yang Chen
Sheng Guo
Bo Zheng
Limin Wang
DD
81
2
0
13 Mar 2025
STEAD: Spatio-Temporal Efficient Anomaly Detection for Time and Compute Sensitive Applications
Andrew Gao
Jun Liu
AI4TS
58
0
0
11 Mar 2025
ICPR 2024 Competition on Rider Intention Prediction
Shankar Gangisetty
Abdul Wasi
Shyam Nandan Rai
C. V. Jawahar
Sajay Raj
...
Ayesha Choudhary
Aaryadev Chandra
Dev Chandan
Shireen Chand
Suvaditya Mukherjee
53
0
0
11 Mar 2025
VoD: Learning Volume of Differences for Video-Based Deepfake Detection
Ying Xu
Marius Pedersen
Kiran Raja
41
0
0
10 Mar 2025
Q-PART: Quasi-Periodic Adaptive Regression with Test-time Training for Pediatric Left Ventricular Ejection Fraction Regression
Jie Liu
Tiexin Qin
Hui Liu
Yilei Shi
Lichao Mou
Xiao Xiang Zhu
Shiqi Wang
Haoliang Li
65
0
0
06 Mar 2025
OpenTAD: A Unified Framework and Comprehensive Study of Temporal Action Detection
Shuming Liu
Chen Zhao
Fatimah Zohra
Mattia Soldan
Alejandro Pardo
...
Juan Carlos León Alcázar
A. Cioppa
Silvio Giancola
Carlos Hinojosa
Bernard Ghanem
68
3
0
27 Feb 2025
Robust Dynamic Facial Expression Recognition
Feng Liu
Hanyang Wang
Siyuan Shen
50
1
0
22 Feb 2025
Natural Language Generation from Visual Sequences: Challenges and Future Directions
Aditya K Surikuchi
Raquel Fernández
Sandro Pezzelle
EGVM
263
0
0
18 Feb 2025
Benchmarking Zero-Shot Facial Emotion Annotation with Large Language Models: A Multi-Class and Multi-Frame Approach in DailyLife
He Zhang
Xinyi Fu
CVBM
50
2
0
18 Feb 2025
Variable-frame CNNLSTM for Breast Nodule Classification using Ultrasound Videos
Xiangxiang Cui
Zhongyu Li
Xiayue Fan
Peng Huang
Ying Wang
Meng Yang
S. chang
Jihua Zhu
38
0
0
17 Feb 2025
Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis
Amir Hosein Fadaei
M. Dehaqani
45
0
0
11 Feb 2025
MD-BERT: Action Recognition in Dark Videos via Dynamic Multi-Stream Fusion and Temporal Modeling
Sharana Dharshikgan Suresh Dass
H. Barua
Ganesh Krishnasamy
Raveendran Paramesran
Raphael C.-W. Phan
69
0
0
06 Feb 2025
BILLNET: A Binarized Conv3D-LSTM Network with Logic-gated residual architecture for hardware-efficient video inference
Van Thien Nguyen
William Guicquero
Gilles Sicard
3DV
MQ
82
2
0
24 Jan 2025
Multi-stage intermediate fusion for multimodal learning to classify non-small cell lung cancer subtypes from CT and PET
Fatih Aksu
Fabrizia Gelardi
Arturo Chiti
Paolo Soda
26
0
0
21 Jan 2025
SRE-Conv: Symmetric Rotation Equivariant Convolution for Biomedical Image Classification
Yuexi Du
Jiazhen Zhang
Tal Zeevi
Nicha Dvornek
J. Onofrey
61
1
0
17 Jan 2025
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
Pinelopi Papalampidi
Skanda Koppula
Shreya Pathak
Justin T Chiu
Joseph Heyward
Viorica Patraucean
Jiajun Shen
Antoine Miech
Andrew Zisserman
Aida Nematzdeh
VLM
69
24
0
31 Dec 2024
Interact with me: Joint Egocentric Forecasting of Intent to Interact, Attitude and Social Actions
Tongfei Bian
Yiming Ma
Mathieu Chollet
Victor Sanchez
T. Guha
EgoV
101
1
0
21 Dec 2024
DriveGazen: Event-Based Driving Status Recognition using Conventional Camera
Xiaoyin Yang
70
0
0
16 Dec 2024
Multimodal Class-aware Semantic Enhancement Network for Audio-Visual Video Parsing
Pengcheng Zhao
Jinxing Zhou
Yang Zhao
Dan Guo
Yanxiang Chen
90
2
0
15 Dec 2024
Uni-AdaFocus: Spatial-temporal Dynamic Computation for Video Recognition
Yulin Wang
Haoji Zhang
Yang Yue
Shiji Song
Chao Deng
Junlan Feng
Gao Huang
86
3
0
15 Dec 2024
Video Representation Learning with Joint-Embedding Predictive Architectures
Katrina Drozdov
Ravid Shwartz-Ziv
Yann LeCun
AI4TS
82
2
0
14 Dec 2024
Enhancing Modality Representation and Alignment for Multimodal Cold-start Active Learning
Meng Shen
Yake Wei
Jianxiong Yin
D. Rajan
D. Hu
Simon See
86
0
0
12 Dec 2024
Annotation Techniques for Judo Combat Phase Classification from Tournament Footage
Anthony Miyaguchi
Jed Moutahir
Tanmay Sutar
77
0
0
10 Dec 2024
Streaming Detection of Queried Event Start
Cristobal Eyzaguirre
Eric Tang
S. Buch
Adrien Gaidon
Jiajun Wu
Juan Carlos Niebles
79
0
0
04 Dec 2024
From Diffusion to Resolution: Leveraging 2D Diffusion Models for 3D Super-Resolution Task
Bohao Chen
Yujie Zhang
Yanan Lv
Hua Han
Xi Chen
MedIm
74
1
0
25 Nov 2024
1
2
3
4
...
24
25
26
Next