Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
1812.05038
Cited By
v1
v2 (latest)
Long-Term Feature Banks for Detailed Video Understanding
12 December 2018
Chao-Yuan Wu
Christoph Feichtenhofer
Haoqi Fan
Kaiming He
Philipp Krahenbuhl
Ross B. Girshick
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Long-Term Feature Banks for Detailed Video Understanding"
50 / 314 papers shown
Title
Beyond Real versus Fake Towards Intent-Aware Video Analysis
Saurabh Atreya
Nabyl Quignon
Baptiste Chopin
Abhijit Das
A. Dantcheva
AAML
68
0
0
27 Nov 2025
Benchmarking and Improving LVLMs on Event Extraction from Multimedia Documents
Fuyu Xing
Zimu Wang
Wei Wang
Haiyang Zhang
VLM
68
0
0
16 Sep 2025
Domain-Adaptive Pretraining Improves Primate Behavior Recognition
Felix B. Mueller
Timo Lueddecke
Richard Vogg
Alexander S. Ecker
105
1
0
15 Sep 2025
Generative Model-Based Feature Attention Module for Video Action Analysis
G. Wang
Peng Zhao
Cong Zhao
Jing Huang
Siyan Guo
Shusen Yang
116
0
0
19 Aug 2025
Generic Event Boundary Detection via Denoising Diffusion
Jaejun Hwang
Dayoung Gong
Manjin Kim
Minsu Cho
DiffM
133
0
0
16 Aug 2025
Fine-Tuning Video Transformers for Word-Level Bangla Sign Language: A Comparative Analysis for Classification Tasks
Jubayer Ahmed Bhuiyan Shawon
H. Mahmud
Kamrul Hasan
132
0
0
04 Jun 2025
DeCafNet: Delegate and Conquer for Efficient Temporal Grounding in Long Videos
Computer Vision and Pattern Recognition (CVPR), 2025
Zijia Lu
A S M Iftekhar
Gaurav Mittal
Tianjian Meng
Xiawei Wang
Cheng Zhao
Rohith Kukkala
Ehsan Elhamifar
Mei Chen
232
3
0
22 May 2025
CAT: Circular-Convolutional Attention for Sub-Quadratic Transformers
Yoshihiro Yamada
ViT
275
0
0
09 Apr 2025
Action tube generation by person query matching for spatio-temporal action detection
Kazuki Omi
Jion Oshima
Toru Tamaki
363
0
0
17 Mar 2025
Salient Temporal Encoding for Dynamic Scene Graph Generation
Zhihao Zhu
236
0
0
15 Mar 2025
Everything Can Be Described in Words: A Simple Unified Multi-Modal Framework with Semantic and Temporal Alignment
Xiaowei Bi
Zheyuan Xu
345
3
0
12 Mar 2025
EVE: Towards End-to-End Video Subtitle Extraction with Vision-Language Models
Haiyang Yu
Jinghui Lu
Yanjie Wang
Yang Li
Han Wang
...
B. Li
Teng Fu
Yang Liu
J. Liu
H. Chen
VLM
265
5
0
06 Mar 2025
Streaming Video Question-Answering with In-context Video KV-Cache Retrieval
International Conference on Learning Representations (ICLR), 2025
Shangzhe Di
Zhelun Yu
Guanghao Zhang
Haoyuan Li
Tao Zhong
Hao Cheng
Bolin Li
Wanggui He
Fangxun Shu
Hao Jiang
201
35
0
01 Mar 2025
OpenTAD: A Unified Framework and Comprehensive Study of Temporal Action Detection
Shuming Liu
Chen Zhao
Fatimah Zohra
Mattia Soldan
Alejandro Pardo
...
Juan Carlos León Alcázar
A. Cioppa
Silvio Giancola
Carlos Hinojosa
Bernard Ghanem
277
5
0
27 Feb 2025
Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis
Amir Hosein Fadaei
M. Dehaqani
309
0
0
11 Feb 2025
Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Wentao Bao
Keqin Li
Yuxiao Chen
Deep Patel
Martin Renqiang Min
Yu Kong
VLM
ObjD
276
7
0
17 Nov 2024
HourVideo: 1-Hour Video-Language Understanding
Neural Information Processing Systems (NeurIPS), 2024
Keshigeyan Chandrasegaran
Agrim Gupta
Lea M. Hadzic
Taran Kota
Jimming He
Cristobal Eyzaguirre
Zane Durante
Pengfei Yu
Jiajun Wu
L. Fei-Fei
VLM
277
82
0
07 Nov 2024
AlphaChimp: Tracking and Behavior Recognition of Chimpanzees
Xiaoxuan Ma
Yutang Lin
Yuan Xu
Stephan P. Kaufhold
Jack Terwilliger
Andres Meza
Yixin Zhu
Federico Rossano
Yizhou Wang
429
4
0
22 Oct 2024
VidCompress: Memory-Enhanced Temporal Compression for Video Understanding in Large Language Models
Xiaohan Lan
Yitian Yuan
Zequn Jie
Lin Ma
VLM
169
4
0
15 Oct 2024
From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding
Heqing Zou
Tianze Luo
Guiyang Xie
Victor
Zhang
...
Guangcong Wang
Juanyang Chen
Zhuochen Wang
Hansheng Zhang
Huaijian Zhang
VLM
289
18
0
27 Sep 2024
SOAR: Self-supervision Optimized UAV Action Recognition with Efficient Object-Aware Pretraining
Ruiqi Xian
Xiyang Wu
Tianrui Guan
Xijun Wang
Boqing Gong
Dinesh Manocha
ViT
252
0
0
26 Sep 2024
JARViS: Detecting Actions in Video Using Unified Actor-Scene Context Relation Modeling
Seok Hwan Lee
Taein Son
Soo Won Seo
Jisong Kim
Jun Won Choi
327
1
0
07 Aug 2024
Learning Video Context as Interleaved Multimodal Sequences
S. Shao
Pengchuan Zhang
Y. Li
Xide Xia
A. Meso
Ziteng Gao
Jinheng Xie
N. Holliman
Mike Zheng Shou
235
12
0
31 Jul 2024
MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
Yatian Wang
Yatian Wang
Aosong Cheng
Pengjun Fang
Zeyue Tian
...
Wenhan Luo
Qifeng Chen
Shanghang Zhang
Qi-fei Liu
Yi-Ting Guo
285
8
0
30 Jul 2024
Classification Matters: Improving Video Action Detection with Class-Specific Attention
European Conference on Computer Vision (ECCV), 2024
Jinsung Lee
Taeoh Kim
Inwoong Lee
Minho Shim
Dongyoon Wee
Minsu Cho
Suha Kwak
376
1
0
29 Jul 2024
VideoMamba: Spatio-Temporal Selective State Space Model
Jinyoung Park
Hee-Seon Kim
Kangwook Ko
Minbeom Kim
Changick Kim
Mamba
269
23
0
11 Jul 2024
Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation
Jiaming Zhou
Teli Ma
Kun-Yu Lin
Ronghe Qiu
Zifan Wang
Junwei Liang
406
17
0
20 Jun 2024
MeMSVD: Long-Range Temporal Structure Capturing Using Incremental SVD
Ioanna Ntinou
Enrique Sanchez
Georgios Tzimiropoulos
200
0
0
11 Jun 2024
Streaming Long Video Understanding with Large Language Models
Rui Qian
Xiao-wen Dong
Pan Zhang
Yuhang Zang
Shuangrui Ding
Dahua Lin
Yuan Liu
VLM
223
110
0
25 May 2024
Looking Backward: Streaming Video-to-Video Translation with Feature Banks
Feng Liang
Akio Kodaira
Chenfeng Xu
Masayoshi Tomizuka
Kurt Keutzer
Diana Marculescu
DiffM
VGen
429
18
0
24 May 2024
Open-Vocabulary Spatio-Temporal Action Detection
Tao Wu
Shuqiu Ge
Jie Qin
Gangshan Wu
Limin Wang
ObjD
191
9
0
17 May 2024
A Semantic and Motion-Aware Spatiotemporal Transformer Network for Action Detection
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Matthew Korban
Peter Youngs
Scott T. Acton
ViT
230
13
0
13 May 2024
Multi-view Action Recognition via Directed Gromov-Wasserstein Discrepancy
Hoang-Quan Nguyen
Thanh-Dat Truong
Khoa Luu
275
1
0
02 May 2024
MovieChat+: Question-aware Sparse Memory for Long Video Question Answering
Enxin Song
Wenhao Chai
Tianbo Ye
Lei Li
Xi Li
Gaoang Wang
VLM
MLLM
239
51
0
26 Apr 2024
STMixer: A One-Stage Sparse Action Detector
Tao Wu
Mengqing Cao
Ziteng Gao
Gangshan Wu
Limin Wang
210
0
0
15 Apr 2024
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
Bo He
Hengduo Li
Young Kyun Jang
Menglin Jia
Xuefei Cao
Ashish Shah
Abhinav Shrivastava
Ser-Nam Lim
MLLM
344
176
0
08 Apr 2024
TIM: A Time Interval Machine for Audio-Visual Action Recognition
Jacob Chalk
Jaesung Huh
Evangelos Kazakos
Andrew Zisserman
Dima Damen
278
24
0
08 Apr 2024
SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos
Tao Wu
Runyu He
Gangshan Wu
Limin Wang
3DH
291
14
0
06 Apr 2024
LongVLM: Efficient Long Video Understanding via Large Language Models
European Conference on Computer Vision (ECCV), 2024
Yuetian Weng
Mingfei Han
Haoyu He
Xiaojun Chang
Bohan Zhuang
VLM
336
123
0
04 Apr 2024
Language Model Guided Interpretable Video Action Reasoning
Computer Vision and Pattern Recognition (CVPR), 2024
Ning Wang
Guangming Zhu
HS Li
Liang Zhang
Syed Afaq Ali Shah
Mohammed Bennamoun
211
7
0
02 Apr 2024
Streaming Dense Video Captioning
Xingyi Zhou
Anurag Arnab
Shyamal Buch
Shen Yan
Austin Myers
Xuehan Xiong
Arsha Nagrani
Cordelia Schmid
VLM
253
72
0
01 Apr 2024
VideoDistill: Language-aware Vision Distillation for Video Question Answering
Bo Zou
Chao Yang
Yu Qiao
Chengbin Quan
Youjian Zhao
VGen
227
3
0
01 Apr 2024
Memory Consolidation Enables Long-Context Video Understanding
Ivana Balavzević
Yuge Shi
Pinelopi Papalampidi
Rahma Chaabouni
Skanda Koppula
Olivier J. Hénaff
443
46
0
08 Feb 2024
Computer Vision for Primate Behavior Analysis in the Wild
Richard Vogg
Timo Lüddecke
Jonathan Henrich
Sharmita Dey
Matthias Nuske
...
Alexander Gail
Stefan Treue
H. Scherberger
Florentin Wörgötter
Alexander S. Ecker
388
14
0
29 Jan 2024
GPT4Ego: Unleashing the Potential of Pre-trained Models for Zero-Shot Egocentric Action Recognition
Guangzhao Dai
Xiangbo Shu
Wenhao Wu
Rui Yan
Jiachao Zhang
VLM
405
9
0
18 Jan 2024
Multiscale Vision Transformers meet Bipartite Matching for efficient single-stage Action Localization
Computer Vision and Pattern Recognition (CVPR), 2023
Ioanna Ntinou
Enrique Sanchez
Georgios Tzimiropoulos
246
7
0
29 Dec 2023
A Simple LLM Framework for Long-Range Video Question-Answering
Ce Zhang
Taixi Lu
Md. Mohaiminul Islam
Ziyang Wang
Shoubin Yu
Mohit Bansal
Gedas Bertasius
371
150
0
28 Dec 2023
Grounding-Prompter: Prompting LLM with Multimodal Information for Temporal Sentence Grounding in Long Videos
Houlun Chen
Xin Wang
Hong Chen
Zihan Song
Jia Jia
Wenwu Zhu
LRM
224
18
0
28 Dec 2023
LvBench: A Benchmark for Long-form Video Understanding with Versatile Multi-modal Question Answering
Hongjie Zhang
Lu Dong
Yi Liu
Yifei Huang
Z. Ling
Yali Wang
Limin Wang
319
22
0
08 Dec 2023
Source-free Video Domain Adaptation by Learning from Noisy Labels
Pattern Recognition (Pattern Recogn.), 2023
A. Dasgupta
C. V. Jawahar
Karteek Alahari
TTA
VLM
445
13
0
30 Nov 2023
1
2
3
4
5
6
7
Next