Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1812.05038
Cited By
Long-Term Feature Banks for Detailed Video Understanding
12 December 2018
Chao-Yuan Wu
Christoph Feichtenhofer
Haoqi Fan
Kaiming He
Philipp Krahenbuhl
Ross B. Girshick
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Long-Term Feature Banks for Detailed Video Understanding"
50 / 306 papers shown
Title
Action tube generation by person query matching for spatio-temporal action detection
Kazuki Omi
Jion Oshima
Toru Tamaki
60
0
0
17 Mar 2025
Salient Temporal Encoding for Dynamic Scene Graph Generation
Zhihao Zhu
44
0
0
15 Mar 2025
Everything Can Be Described in Words: A Simple Unified Multi-Modal Framework with Semantic and Temporal Alignment
Xiaowei Bi
Zheyuan Xu
53
1
0
12 Mar 2025
EVE: Towards End-to-End Video Subtitle Extraction with Vision-Language Models
Haiyang Yu
Jinghui Lu
Yanjie Wang
Yang Li
H. Wang
Can Huang
B. Li
VLM
57
1
0
06 Mar 2025
Streaming Video Question-Answering with In-context Video KV-Cache Retrieval
Shangzhe Di
Zhelun Yu
Guanghao Zhang
Haoyuan Li
Tao Zhong
Hao Cheng
Bolin Li
Wanggui He
Fangxun Shu
Hao Jiang
68
4
0
01 Mar 2025
OpenTAD: A Unified Framework and Comprehensive Study of Temporal Action Detection
Shuming Liu
Chen Zhao
Fatimah Zohra
Mattia Soldan
Alejandro Pardo
...
Juan Carlos León Alcázar
A. Cioppa
Silvio Giancola
Carlos Hinojosa
Bernard Ghanem
57
3
0
27 Feb 2025
Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis
Amir Hosein Fadaei
M. Dehaqani
42
0
0
11 Feb 2025
Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection
Wentao Bao
K. Li
Yuxiao Chen
Deep Patel
Martin Renqiang Min
Yu Kong
VLM
ObjD
42
2
0
17 Nov 2024
HourVideo: 1-Hour Video-Language Understanding
Keshigeyan Chandrasegaran
Agrim Gupta
Lea M. Hadzic
Taran Kota
Jimming He
Cristobal Eyzaguirre
Zane Durante
Manling Li
Jiajun Wu
L. Fei-Fei
VLM
41
31
0
07 Nov 2024
AlphaChimp: Tracking and Behavior Recognition of Chimpanzees
Xiaoxuan Ma
Yutang Lin
Yuan Xu
Stephan P. Kaufhold
Jack Terwilliger
Andres Meza
Yixin Zhu
Federico Rossano
Yizhou Wang
34
0
0
22 Oct 2024
VidCompress: Memory-Enhanced Temporal Compression for Video Understanding in Large Language Models
Xiaohan Lan
Yitian Yuan
Zequn Jie
Lin Ma
VLM
21
2
0
15 Oct 2024
From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding
Heqing Zou
Tianze Luo
Guiyang Xie
Victor
Zhang
...
Guangcong Wang
Juanyang Chen
Zhuochen Wang
Hansheng Zhang
Huaijian Zhang
VLM
34
6
0
27 Sep 2024
SOAR: Self-supervision Optimized UAV Action Recognition with Efficient Object-Aware Pretraining
Ruiqi Xian
Xiyang Wu
Tianrui Guan
Xijun Wang
Boqing Gong
Dinesh Manocha
ViT
29
0
0
26 Sep 2024
JARViS: Detecting Actions in Video Using Unified Actor-Scene Context Relation Modeling
Seok Hwan Lee
Taein Son
Soo Won Seo
Jisong Kim
Jun Won Choi
37
0
0
07 Aug 2024
Learning Video Context as Interleaved Multimodal Sequences
S. Shao
Pengchuan Zhang
Y. Li
Xide Xia
A. Meso
Ziteng Gao
Jinheng Xie
N. Holliman
Mike Zheng Shou
41
5
0
31 Jul 2024
MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
Xiaowei Chi
Yatian Wang
Aosong Cheng
Pengjun Fang
Zeyue Tian
...
Wenhan Luo
Qifeng Chen
Shanghang Zhang
Qi-fei Liu
Yi-Ting Guo
67
7
0
30 Jul 2024
Classification Matters: Improving Video Action Detection with Class-Specific Attention
Jinsung Lee
Taeoh Kim
Inwoong Lee
Minho Shim
Dongyoon Wee
Minsu Cho
Suha Kwak
44
0
0
29 Jul 2024
VideoMamba: Spatio-Temporal Selective State Space Model
Jinyoung Park
Hee-Seon Kim
Kangwook Ko
Minbeom Kim
Changick Kim
Mamba
34
7
0
11 Jul 2024
Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation
Jiaming Zhou
Teli Ma
Kun-Yu Lin
Ronghe Qiu
Zifan Wang
Junwei Liang
41
3
0
20 Jun 2024
MeMSVD: Long-Range Temporal Structure Capturing Using Incremental SVD
Ioanna Ntinou
Enrique Sanchez
Georgios Tzimiropoulos
34
0
0
11 Jun 2024
Streaming Long Video Understanding with Large Language Models
Rui Qian
Xiao-wen Dong
Pan Zhang
Yuhang Zang
Shuangrui Ding
Dahua Lin
Jiaqi Wang
VLM
31
40
0
25 May 2024
Looking Backward: Streaming Video-to-Video Translation with Feature Banks
Feng Liang
Akio Kodaira
Chenfeng Xu
M. Tomizuka
Kurt Keutzer
Diana Marculescu
DiffM
VGen
70
7
0
24 May 2024
Open-Vocabulary Spatio-Temporal Action Detection
Tao Wu
Shuqiu Ge
Jie Qin
Gangshan Wu
Limin Wang
ObjD
23
5
0
17 May 2024
A Semantic and Motion-Aware Spatiotemporal Transformer Network for Action Detection
Matthew Korban
Peter Youngs
Scott T. Acton
ViT
27
6
0
13 May 2024
Multi-view Action Recognition via Directed Gromov-Wasserstein Discrepancy
Hoang-Quan Nguyen
Thanh-Dat Truong
Khoa Luu
34
1
0
02 May 2024
MovieChat+: Question-aware Sparse Memory for Long Video Question Answering
Enxin Song
Wenhao Chai
Tianbo Ye
Jenq-Neng Hwang
Xi Li
Gaoang Wang
VLM
MLLM
24
28
0
26 Apr 2024
STMixer: A One-Stage Sparse Action Detector
Tao Wu
Mengqing Cao
Ziteng Gao
Gangshan Wu
Limin Wang
22
0
0
15 Apr 2024
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
Bo He
Hengduo Li
Young Kyun Jang
Menglin Jia
Xuefei Cao
Ashish Shah
Abhinav Shrivastava
Ser-Nam Lim
MLLM
81
88
0
08 Apr 2024
TIM: A Time Interval Machine for Audio-Visual Action Recognition
Jacob Chalk
Jaesung Huh
Evangelos Kazakos
Andrew Zisserman
Dima Damen
33
8
0
08 Apr 2024
SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos
Tao Wu
Runyu He
Gangshan Wu
Limin Wang
3DH
44
3
0
06 Apr 2024
LongVLM: Efficient Long Video Understanding via Large Language Models
Yuetian Weng
Mingfei Han
Haoyu He
Xiaojun Chang
Bohan Zhuang
VLM
60
56
0
04 Apr 2024
Language Model Guided Interpretable Video Action Reasoning
Ning Wang
Guangming Zhu
HS Li
Liang Zhang
Syed Afaq Ali Shah
Mohammed Bennamoun
46
2
0
02 Apr 2024
Streaming Dense Video Captioning
Xingyi Zhou
Anurag Arnab
Shyamal Buch
Shen Yan
Austin Myers
Xuehan Xiong
Arsha Nagrani
Cordelia Schmid
VLM
31
31
0
01 Apr 2024
VideoDistill: Language-aware Vision Distillation for Video Question Answering
Bo Zou
Chao Yang
Yu Qiao
Chengbin Quan
Youjian Zhao
VGen
42
1
0
01 Apr 2024
Memory Consolidation Enables Long-Context Video Understanding
Ivana Balavzević
Yuge Shi
Pinelopi Papalampidi
Rahma Chaabouni
Skanda Koppula
Olivier J. Hénaff
97
22
0
08 Feb 2024
Computer Vision for Primate Behavior Analysis in the Wild
Richard Vogg
Timo Lüddecke
Jonathan Henrich
Sharmita Dey
Matthias Nuske
...
Alexander Gail
Stefan Treue
H. Scherberger
F. Worgotter
Alexander S. Ecker
28
3
0
29 Jan 2024
GPT4Ego: Unleashing the Potential of Pre-trained Models for Zero-Shot Egocentric Action Recognition
Guangzhao Dai
Xiangbo Shu
Wenhao Wu
Rui Yan
Jiachao Zhang
VLM
14
5
0
18 Jan 2024
Multiscale Vision Transformers meet Bipartite Matching for efficient single-stage Action Localization
Ioanna Ntinou
Enrique Sanchez
Georgios Tzimiropoulos
47
4
0
29 Dec 2023
A Simple LLM Framework for Long-Range Video Question-Answering
Ce Zhang
Taixi Lu
Md. Mohaiminul Islam
Ziyang Wang
Shoubin Yu
Mohit Bansal
Gedas Bertasius
100
80
0
28 Dec 2023
Grounding-Prompter: Prompting LLM with Multimodal Information for Temporal Sentence Grounding in Long Videos
Houlun Chen
Xin Wang
Hong Chen
Zihan Song
Jia Jia
Wenwu Zhu
LRM
31
10
0
28 Dec 2023
MoVQA: A Benchmark of Versatile Question-Answering for Long-Form Movie Understanding
Hongjie Zhang
Yi Liu
Lu Dong
Yifei Huang
Z. Ling
Yali Wang
Limin Wang
Yu Qiao
23
25
0
08 Dec 2023
Overcoming Label Noise for Source-free Unsupervised Video Domain Adaptation
A. Dasgupta
C. V. Jawahar
Karteek Alahari
TTA
VLM
11
10
0
30 Nov 2023
Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding in Novel Domains
Rohan Myer Krishnan
Zitian Tang
Zhiqiu Yu
Chen Sun
51
1
0
30 Nov 2023
Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition
Jiaming Zhou
Hanjun Li
Kun-Yu Lin
Junwei Liang
21
1
0
28 Nov 2023
Query by Activity Video in the Wild
Tao Hu
William Thong
Pascal Mettes
Cees G. M. Snoek
22
0
0
23 Nov 2023
Event Causality Is Key to Computational Story Understanding
Yidan Sun
Qin Chao
Boyang Albert Li
18
5
0
16 Nov 2023
Beyond still images: Temporal features and input variance resilience
AmirHosein Fadaei
M. Dehaqani
30
0
0
01 Nov 2023
Object-centric Video Representation for Long-term Action Anticipation
Ce Zhang
Changcheng Fu
Shijie Wang
Nakul Agarwal
Kwonjoon Lee
Chiho Choi
Chen Sun
15
14
0
31 Oct 2023
ChimpACT: A Longitudinal Dataset for Understanding Chimpanzee Behaviors
Xiaoxuan Ma
Stephan P. Kaufhold
Jiajun Su
Wentao Zhu
Jack Terwilliger
Andres Meza
Yixin Zhu
Federico Rossano
Yizhou Wang
21
13
0
25 Oct 2023
Flow Dynamics Correction for Action Recognition
Lei Wang
Piotr Koniusz
21
10
0
16 Oct 2023
1
2
3
4
5
6
7
Next