Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
1712.04851
Cited By
v1
v2 (latest)
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification
13 December 2017
Saining Xie
Chen Sun
Jonathan Huang
Zhuowen Tu
Kevin Patrick Murphy
3DH
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification"
50 / 674 papers shown
Title
Towards an Effective Action-Region Tracking Framework for Fine-grained Video Action Recognition
Baoli Sun
Y. X. R. Wang
Xinzhu Ma
Zhihui Wang
Kun Lu
Zhiyong Wang
118
0
0
26 Nov 2025
Learning Skill-Attributes for Transferable Assessment in Video
Kumar Ashutosh
Kristen Grauman
85
0
0
17 Nov 2025
Do Blind Spots Matter for Word-Referent Mapping? A Computational Study with Infant Egocentric Video
Zekai Shi
Zhixi Cai
Kalin Stefanov
EgoV
80
0
0
13 Nov 2025
AdSum: Two-stream Audio-visual Summarization for Automated Video Advertisement Clipping
Wen Xie
Yanjun Zhu
Gijs Overgoor
Yakov Bart
Agata Lapedriza Garcia
Sarah Ostadabbas
55
0
0
30 Oct 2025
Seeing, Signing, and Saying: A Vision-Language Model-Assisted Pipeline for Sign Language Data Acquisition and Curation from Social Media
Shakib Yazdani
Yasser Hamidullah
C. España-Bonet
Josef van Genabith
SLR
150
1
0
29 Oct 2025
Sign Language Translation with Sentence Embedding Supervision
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Yasser Hamidullah
Josef van Genabith
C. España-Bonet
SLR
204
11
0
22 Oct 2025
DGME-T: Directional Grid Motion Encoding for Transformer-Based Historical Camera Movement Classification
Tingyu Lin
Armin Dadras
Florian Kleber
Robert Sablatnig
VGen
33
0
0
17 Oct 2025
Prompt-guided Disentangled Representation for Action Recognition
Tianci Wu
Guangming Zhu
Jiang Lu
Siyuan Wang
Ning Wang
Nuoye Xiong
Zhang Liang
158
0
0
26 Sep 2025
Temporally Heterogeneous Graph Contrastive Learning for Multimodal Acoustic event Classification
Yuanjian Chen
Yang Xiao
Jinjie Huang
68
0
0
18 Sep 2025
Video Understanding by Design: How Datasets Shape Architectures and Insights
Lei Wang
Piotr Koniusz
Yongsheng Gao
3DV
VGen
AI4TS
189
0
0
11 Sep 2025
Aligning Moments in Time using Video Queries
Yogesh Kumar
Uday Agarwal
Manish Gupta
Anand Mishra
199
1
0
21 Aug 2025
CRAM: Large-scale Video Continual Learning with Bootstrapped Compression
Shivani Mall
Joao F. Henriques
CLL
VLM
96
0
0
07 Aug 2025
Hybrid Hypergraph Networks for Multimodal Sequence Data Classification
Feng Xu
Hui Wang
Yuting Huang
Danwei Zhang
Zizhu Fan
86
0
0
30 Jul 2025
Multi-Focus Temporal Shifting for Precise Event Spotting in Sports Videos
Hao Xu
Sam Wells
Mohamed Reda Bouadjenek
Richard Dazeley
224
1
0
10 Jul 2025
AI-Generated Video Detection via Perceptual Straightening
Christian Internò
Robert Geirhos
Markus Olhofer
Sunny Liu
Barbara Hammer
David Klindt
238
1
0
01 Jul 2025
EVA02-AT: Egocentric Video-Language Understanding with Spatial-Temporal Rotary Positional Embeddings and Symmetric Optimization
Xiaoqi Wang
Yi Wang
Lap-Pui Chau
140
1
0
17 Jun 2025
Enhancing Rating-Based Reinforcement Learning to Effectively Leverage Feedback from Large Vision-Language Models
Tung M. Luu
Younghwan Lee
Donghoon Lee
Sunho Kim
Min Jun Kim
Chang D. Yoo
ALM
VLM
140
6
0
15 Jun 2025
An Effective End-to-End Solution for Multimodal Action Recognition
International Conference on Pattern Recognition (ICPR), 2025
Songping Wang
Xiantao Hu
Yueming Lyu
Caifeng Shan
187
2
0
11 Jun 2025
Geo-Sign: Hyperbolic Contrastive Regularisation for Geometrically Aware Sign Language Translation
Edward Fish
Richard Bowden
SLR
453
4
0
30 May 2025
Unsupervised Transcript-assisted Video Summarization and Highlight Detection
Spyros Barbakos
Charalampos Antoniadis
Gerasimos Potamianos
Gianluca Setti
OffRL
AI4TS
400
0
0
29 May 2025
CA3D: Convolutional-Attentional 3D Nets for Efficient Video Activity Recognition on the Edge
Gabriele Lagani
Fabrizio Falchi
Claudio Gennaro
Giuseppe Amato
122
1
0
26 May 2025
Advancing Video Self-Supervised Learning via Image Foundation Models
Pattern Recognition Letters (Pattern Recogn. Lett.), 2025
Jingwei Wu
Zhewei Huang
Chang Liu
152
0
0
25 May 2025
ReWiND: Language-Guided Rewards Teach Robot Policies without New Demonstrations
Jiahui Zhang
Yusen Luo
Abrar Anwar
Sumedh Anand Sontakke
Joseph J Lim
Jesse Thomason
Erdem Biyik
Jesse Zhang
OffRL
LM&Ro
344
15
0
16 May 2025
Audio-Visual Class-Incremental Learning for Fish Feeding intensity Assessment in Aquaculture
Meng Cui
Xianghu Yue
Xinyuan Qian
Jinzheng Zhao
Haohe Liu
Xubo Liu
Daoliang Li
Wenwu Wang
295
1
0
21 Apr 2025
Text-Audio-Visual-conditioned Diffusion Model for Video Saliency Prediction
Li Yu
Xuanzhe Sun
Wei Zhou
Moncef Gabbouj
DiffM
167
1
0
19 Apr 2025
DTFSal: Audio-Visual Dynamic Token Fusion for Video Saliency Prediction
Kiana Hoshanfar
Alireza Hosseini
Ahmad Kalhor
Babak N. Araabi
889
0
0
14 Apr 2025
Towards Scalable Modeling of Compressed Videos for Efficient Action Recognition
Shristi Das Biswas
Efstathia Soufleri
Arani Roy
Kaushik Roy
238
1
0
17 Mar 2025
Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis
Amir Hosein Fadaei
M. Dehaqani
285
0
0
11 Feb 2025
Imitation Learning from a Single Temporally Misaligned Video
William Huey
Huaxiaoyue Wang
Anne Wu
Yoav Artzi
Sanjiban Choudhury
AI4TS
293
1
0
08 Feb 2025
EditIQ: Automated Cinematic Editing of Static Wide-Angle Videos via Dialogue Interpretation and Saliency Cues
International Conference on Intelligent User Interfaces (IUI), 2025
Rohit Girmaji
Bhav Beri
Ramanathan Subramanian
Vineet Gandhi
VGen
354
1
0
04 Feb 2025
Minimalistic Video Saliency Prediction via Efficient Decoder & Spatio Temporal Action Cues
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Rohit Girmaji
Siddharth Jain
Bhav Beri
Sarthak Bansal
Vineet Gandhi
ViT
190
2
0
01 Feb 2025
BILLNET: A Binarized Conv3D-LSTM Network with Logic-gated residual architecture for hardware-efficient video inference
IEEE Workshop on Signal Processing Systems (SiPS), 2022
Van Thien Nguyen
William Guicquero
Gilles Sicard
3DV
MQ
261
2
0
24 Jan 2025
WhACC: Whisker Automatic Contact Classifier with Expert Human-Level Performance
bioRxiv (bioRxiv), 2023
Phillip Maire
Samson G. King
Jonathan Andrew Cheung
Stefanie Walker
Samuel Andrew Hires
299
0
0
06 Jan 2025
GFG -- Gender-Fair Generation: A CALAMITA Challenge
Simona Frenda
Andrea Piergentili
Beatrice Savoldi
Marco Madeddu
Martina Rosola
Silvia Casola
Chiara Ferrando
V. Patti
Matteo Negri
L. Bentivogli
248
10
0
31 Dec 2024
Uni-AdaFocus: Spatial-temporal Dynamic Computation for Video Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Yulin Wang
Haoji Zhang
Yang Yue
Shiji Song
Chao Deng
Junlan Feng
Gao Huang
235
11
0
15 Dec 2024
Reversing the Damage: A QP-Aware Transformer-Diffusion Approach for 8K Video Restoration under Codec Compression
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Ali Mollaahmadi Dehaghi
Reza Razavi
Mohammad Moshirpour
257
3
0
12 Dec 2024
Relevance-guided Audio Visual Fusion for Video Saliency Prediction
Li Yu
Xuanzhe Sun
Pan Gao
Moncef Gabbouj
239
1
0
18 Nov 2024
AM Flow: Adapters for Temporal Processing in Action Recognition
Tanay Agrawal
Abid Ali
A. Dantcheva
François Brémond
199
0
0
04 Nov 2024
MM-WLAuslan: Multi-View Multi-Modal Word-Level Australian Sign Language Recognition Dataset
Neural Information Processing Systems (NeurIPS), 2024
Xin Shen
Heming Du
Hongwei Sheng
Shuyun Wang
Hui Chen
...
Xiaobiao Du
Shuyun Wang
Ruihan Lu
Qingzheng Xu
Xin Yu
SLR
156
10
0
25 Oct 2024
GenAI Assisting Medical Training
Stefan Gerd Fritsch
Matthias Tschoepe
Vitor Fortes Rey
Lars Krupp
Agnes Gruenerbl
Eloise Monger
Sarah Travenna
144
0
0
21 Oct 2024
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond
Shanshan Han
479
1
0
09 Oct 2024
Grounding is All You Need? Dual Temporal Grounding for Video Dialog
You Qin
Wei Ji
Xinze Lan
Hao Fei
Xun Yang
Dan Guo
Roger Zimmermann
Lizi Liao
VGen
225
2
0
08 Oct 2024
Enhancing Temporal Modeling of Video LLMs via Time Gating
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Zi-Yuan Hu
Yiwu Zhong
Shijia Huang
Michael R. Lyu
Liwei Wang
VLM
152
6
0
08 Oct 2024
VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning
International Conference on Learning Representations (ICLR), 2024
Han Lin
Tushar Nagarajan
Nicolas Ballas
Mido Assran
Mojtaba Komeili
Joey Tianyi Zhou
Koustuv Sinha
AI4TS
217
5
0
04 Oct 2024
REST-HANDS: Rehabilitation with Egocentric Vision Using Smartglasses for Treatment of Hands after Surviving Stroke
Wiktor Mucha
Kentaro Tanaka
M. Kampel
184
0
0
30 Sep 2024
Temporally Aligned Audio for Video with Autoregression
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Ilpo Viertola
Vladimir E. Iashin
Esa Rahtu
VGen
168
41
0
20 Sep 2024
High-Order Evolving Graphs for Enhanced Representation of Traffic Dynamics
Aditya Humnabadkar
Arindam Sikdar
Benjamin Cave
Huaizhong Zhang
P. Bakaki
Ardhendu Behera
298
0
0
17 Sep 2024
KOI: Accelerating Online Imitation Learning via Hybrid Key-state Guidance
Conference on Robot Learning (CoRL), 2024
Jingxian Lu
Wenke Xia
Dong Wang
Zhigang Wang
Bin Zhao
Di Hu
Xuelong Li
151
4
0
06 Aug 2024
Is 3D Convolution with 5D Tensors Really Necessary for Video Analysis?
Habib Hajimolahoseini
Walid Ahmed
Austin Wen
Yang Liu
182
0
0
23 Jul 2024
Self-Supervised Video Representation Learning in a Heuristic Decoupled Perspective
Changwen Zheng
Wenwen Qiang
Jianqi Zhang
Changwen Zheng
Jingyao Wang
SSL
220
0
0
19 Jul 2024
1
2
3
4
...
12
13
14
Next