ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1812.05038
  4. Cited By
Long-Term Feature Banks for Detailed Video Understanding
v1v2 (latest)

Long-Term Feature Banks for Detailed Video Understanding

12 December 2018
Chao-Yuan Wu
Christoph Feichtenhofer
Haoqi Fan
Kaiming He
Philipp Krahenbuhl
Ross B. Girshick
ArXiv (abs)PDFHTML

Papers citing "Long-Term Feature Banks for Detailed Video Understanding"

50 / 315 papers shown
VidTr: Video Transformer Without Convolutions
VidTr: Video Transformer Without ConvolutionsIEEE International Conference on Computer Vision (ICCV), 2021
Yanyi Zhang
Xinyu Li
Chunhui Liu
Bing Shuai
Yi Zhu
Biagio Brattoli
Hao Chen
I. Marsic
Joseph Tighe
ViT
418
217
0
23 Apr 2021
Multiscale Vision Transformers
Multiscale Vision TransformersIEEE International Conference on Computer Vision (ICCV), 2021
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
ViT
481
1,513
0
22 Apr 2021
H2O: Two Hands Manipulating Objects for First Person Interaction
  Recognition
H2O: Two Hands Manipulating Objects for First Person Interaction RecognitionIEEE International Conference on Computer Vision (ICCV), 2021
Taein Kwon
Bugra Tekin
Jan Stühmer
Federica Bogo
Marc Pollefeys
EgoV
375
234
0
22 Apr 2021
Temporal Query Networks for Fine-grained Video Understanding
Temporal Query Networks for Fine-grained Video UnderstandingComputer Vision and Pattern Recognition (CVPR), 2021
Chuhan Zhang
Ankush Gupta
Andrew Zisserman
254
98
0
19 Apr 2021
Spatiotemporal Deformable Scene Graphs for Complex Activity Detection
Spatiotemporal Deformable Scene Graphs for Complex Activity DetectionBritish Machine Vision Conference (BMVC), 2021
Salman Khan
Fabio Cuzzolin
3DPC
238
5
0
16 Apr 2021
Beyond Short Clips: End-to-End Video-Level Learning with Collaborative
  Memories
Beyond Short Clips: End-to-End Video-Level Learning with Collaborative MemoriesComputer Vision and Pattern Recognition (CVPR), 2021
Xitong Yang
Haoqi Fan
Lorenzo Torresani
L. Davis
Heng Wang
VLM
183
23
0
02 Apr 2021
Visual Semantic Role Labeling for Video Understanding
Visual Semantic Role Labeling for Video UnderstandingComputer Vision and Pattern Recognition (CVPR), 2021
Arka Sadhu
Tanmay Gupta
Mark Yatskar
Ram Nevatia
Aniruddha Kembhavi
VLM
290
88
0
02 Apr 2021
TubeR: Tubelet Transformer for Video Action Detection
TubeR: Tubelet Transformer for Video Action DetectionComputer Vision and Pattern Recognition (CVPR), 2021
Jiaojiao Zhao
Yanyi Zhang
Xinyu Li
Hao Chen
Shuai Bing
...
Yuanjun Xiong
Davide Modolo
I. Marsic
Cees G. M. Snoek
Joseph Tighe
ViT
344
92
0
02 Apr 2021
Motion Guided Attention Fusion to Recognize Interactions from Videos
Motion Guided Attention Fusion to Recognize Interactions from VideosIEEE International Conference on Computer Vision (ICCV), 2021
Tae Soo Kim
Jonathan D. Jones
Gregory Hager
103
19
0
01 Apr 2021
Learning Representational Invariances for Data-Efficient Action
  Recognition
Learning Representational Invariances for Data-Efficient Action RecognitionComputer Vision and Image Understanding (CVIU), 2021
Yuliang Zou
Jinwoo Choi
Qitong Wang
Jia-Bin Huang
312
45
0
30 Mar 2021
Temporal Memory Relation Network for Workflow Recognition from Surgical
  Video
Temporal Memory Relation Network for Workflow Recognition from Surgical VideoIEEE Transactions on Medical Imaging (IEEE TMI), 2021
Yueming Jin
Yonghao Long
Cheng Chen
Zixu Zhao
Qi Dou
Pheng-Ann Heng
244
117
0
30 Mar 2021
Augmented Transformer with Adaptive Graph for Temporal Action Proposal
  Generation
Augmented Transformer with Adaptive Graph for Temporal Action Proposal Generation
Shuning Chang
Pichao Wang
F. Wang
Hao Li
Jiashi Feng
ViT
217
46
0
30 Mar 2021
ViViT: A Video Vision Transformer
ViViT: A Video Vision TransformerIEEE International Conference on Computer Vision (ICCV), 2021
Anurag Arnab
Mostafa Dehghani
G. Heigold
Chen Sun
Mario Lucic
Cordelia Schmid
ViT
545
2,702
0
29 Mar 2021
Memory Enhanced Embedding Learning for Cross-Modal Video-Text Retrieval
Memory Enhanced Embedding Learning for Cross-Modal Video-Text Retrieval
Rui Zhao
Kecheng Zheng
Zhengjun Zha
Hongtao Xie
Jiebo Luo
138
3
0
29 Mar 2021
Unified Graph Structured Models for Video Understanding
Unified Graph Structured Models for Video UnderstandingIEEE International Conference on Computer Vision (ICCV), 2021
Anurag Arnab
Chen Sun
Cordelia Schmid
230
52
0
29 Mar 2021
Regular Polytope Networks
Regular Polytope NetworksIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2021
F. Pernici
Matteo Bruni
C. Baecchi
Marco Bertini
191
30
0
29 Mar 2021
On the hidden treasure of dialog in video question answering
On the hidden treasure of dialog in video question answeringIEEE International Conference on Computer Vision (ICCV), 2021
Deniz Engin
Franccois Schnitzler
Ngoc Q. K. Duong
Yannis Avrithis
229
12
0
26 Mar 2021
Temporal Context Aggregation Network for Temporal Action Proposal
  Refinement
Temporal Context Aggregation Network for Temporal Action Proposal RefinementComputer Vision and Pattern Recognition (CVPR), 2021
Zhiwu Qing
Haisheng Su
Weihao Gan
Dongliang Wang
Wei Wu
Xiang Wang
Yu Qiao
Junjie Yan
Changxin Gao
Nong Sang
192
205
0
24 Mar 2021
Context-aware Biaffine Localizing Network for Temporal Sentence
  Grounding
Context-aware Biaffine Localizing Network for Temporal Sentence GroundingComputer Vision and Pattern Recognition (CVPR), 2021
Daizong Liu
Xiaoye Qu
Jianfeng Dong
Pan Zhou
Yu Cheng
Wei Wei
Zichuan Xu
Yulai Xie
201
173
0
22 Mar 2021
PGT: A Progressive Method for Training Models on Long Videos
PGT: A Progressive Method for Training Models on Long VideosComputer Vision and Pattern Recognition (CVPR), 2021
Bo Pang
Gao Peng
Yizhuo Li
Cewu Lu
VLM
128
13
0
21 Mar 2021
Enhancing Transformer for Video Understanding Using Gated Multi-Level
  Attention and Temporal Adversarial Training
Enhancing Transformer for Video Understanding Using Gated Multi-Level Attention and Temporal Adversarial Training
Saurabh Sahu
Palash Goyal
ViT
125
2
0
18 Mar 2021
ROAD: The ROad event Awareness Dataset for Autonomous Driving
ROAD: The ROad event Awareness Dataset for Autonomous DrivingIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Gurkirt Singh
Stephen Akrigg
Manuele Di Maio
Valentina Fontana
Reza Javanmard Alitappeh
...
Salman Khan
S. Grazioso
Andrew Bradley
G. Gironimo
Fabio Cuzzolin
226
108
0
23 Feb 2021
Learning to Recognize Actions on Objects in Egocentric Video with
  Attention Dictionaries
Learning to Recognize Actions on Objects in Egocentric Video with Attention DictionariesIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Swathikiran Sudhakaran
Sergio Escalera
Oswald Lanz
EgoV
209
22
0
16 Feb 2021
Win-Fail Action Recognition
Win-Fail Action Recognition
Paritosh Parmar
B. Morris
158
6
0
15 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Is Space-Time Attention All You Need for Video Understanding?International Conference on Machine Learning (ICML), 2021
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
1.1K
2,648
0
09 Feb 2021
Video Transformer Network
Video Transformer Network
Daniel Neimark
Omri Bar
Maya Zohar
Dotan Asselmann
ViT
783
475
0
01 Feb 2021
Discovering Multi-Label Actor-Action Association in a Weakly Supervised
  Setting
Discovering Multi-Label Actor-Action Association in a Weakly Supervised SettingAsian Conference on Computer Vision (ACCV), 2021
Sovan Biswas
Juergen Gall
166
2
0
21 Jan 2021
Smoothed Gaussian Mixture Models for Video Classification and
  Recommendation
Smoothed Gaussian Mixture Models for Video Classification and Recommendation
Sirjan Kafle
Aman Gupta
Xue Xia
A. Sankar
Xi Chen
Di Wen
Liang Zhang
110
0
0
17 Dec 2020
NUTA: Non-uniform Temporal Aggregation for Action Recognition
NUTA: Non-uniform Temporal Aggregation for Action RecognitionIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2020
Xinyu Li
Chunhui Liu
Bing Shuai
Yi Zhu
Hao Chen
Joseph Tighe
ViT
120
17
0
15 Dec 2020
A Comprehensive Study of Deep Video Action Recognition
A Comprehensive Study of Deep Video Action Recognition
Yi Zhu
Xinyu Li
Chunhui Liu
Mohammadreza Zolfaghari
Yuanjun Xiong
Chongruo Wu
Zhi-Li Zhang
Joseph Tighe
R. Manmatha
Mu Li
VLMAI4TS
283
210
0
11 Dec 2020
CompFeat: Comprehensive Feature Aggregation for Video Instance
  Segmentation
CompFeat: Comprehensive Feature Aggregation for Video Instance SegmentationAAAI Conference on Artificial Intelligence (AAAI), 2020
Yang Fu
Linjie Yang
Ding Liu
Thomas S. Huang
Humphrey Shi
VOS
284
75
0
07 Dec 2020
SAFCAR: Structured Attention Fusion for Compositional Action Recognition
SAFCAR: Structured Attention Fusion for Compositional Action Recognition
Tae Soo Kim
Gregory Hager
CoGe
174
10
0
03 Dec 2020
Recent Progress in Appearance-based Action Recognition
Recent Progress in Appearance-based Action Recognition
J. Humphreys
Zhe Chen
Dacheng Tao
170
0
0
25 Nov 2020
TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization
  Tasks
TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks
Humam Alwassel
Silvio Giancola
Guohao Li
239
143
0
23 Nov 2020
Memory Optimization for Deep Networks
Memory Optimization for Deep NetworksInternational Conference on Learning Representations (ICLR), 2020
Aashaka Shah
Chaoxia Wu
Jayashree Mohan
Vijay Chidambaram
Philipp Krahenbuhl
157
27
0
27 Oct 2020
Hierarchical Conditional Relation Networks for Multimodal Video Question
  Answering
Hierarchical Conditional Relation Networks for Multimodal Video Question AnsweringInternational Journal of Computer Vision (IJCV), 2020
T. Le
Vuong Le
Svetha Venkatesh
T. Tran
BDL
356
28
0
18 Oct 2020
Pose And Joint-Aware Action Recognition
Pose And Joint-Aware Action Recognition
Anshul B. Shah
Shlok Kumar Mishra
Ankan Bansal
Jun-Cheng Chen
Ramalingam Chellappa
Abhinav Shrivastava
328
36
0
16 Oct 2020
Deep Sequence Learning for Video Anticipation: From Discrete and
  Deterministic to Continuous and Stochastic
Deep Sequence Learning for Video Anticipation: From Discrete and Deterministic to Continuous and Stochastic
S. Aliakbarian
AI4TS
128
0
0
09 Oct 2020
Dissected 3D CNNs: Temporal Skip Connections for Efficient Online Video
  Processing
Dissected 3D CNNs: Temporal Skip Connections for Efficient Online Video Processing
Okan Kopuklu
Stefan Hormann
Fabian Herzog
Hakan Çevikalp
Gerhard Rigoll
3DPC
151
17
0
30 Sep 2020
Texture Memory-Augmented Deep Patch-Based Image Inpainting
Texture Memory-Augmented Deep Patch-Based Image Inpainting
Rui Xu
Minghao Guo
Yuan Liu
Xiaoxiao Li
Bolei Zhou
Chen Change Loy
3DV
245
47
0
28 Sep 2020
Multi-Label Activity Recognition using Activity-specific Features and
  Activity Correlations
Multi-Label Activity Recognition using Activity-specific Features and Activity CorrelationsComputer Vision and Pattern Recognition (CVPR), 2020
Yanyi Zhang
Xinyu Li
I. Marsic
HAI
157
28
0
16 Sep 2020
Online Spatiotemporal Action Detection and Prediction via Causal
  Representations
Online Spatiotemporal Action Detection and Prediction via Causal Representations
Gurkirt Singh
3DPCCML
181
0
0
31 Aug 2020
A Prospective Study on Sequence-Driven Temporal Sampling and Ego-Motion
  Compensation for Action Recognition in the EPIC-Kitchens Dataset
A Prospective Study on Sequence-Driven Temporal Sampling and Ego-Motion Compensation for Action Recognition in the EPIC-Kitchens Dataset
Alejandro López-Cifuentes
Marcos Escudero-Viñolo
Jesús Bescós
EgoV
112
2
0
26 Aug 2020
Query Twice: Dual Mixture Attention Meta Learning for Video
  Summarization
Query Twice: Dual Mixture Attention Meta Learning for Video Summarization
Junyan Wang
Yang Bai
Yang Long
Bingzhang Hu
Z. Chai
Yu Guan
Xiaolin K. Wei
EgoV
208
21
0
19 Aug 2020
AssembleNet++: Assembling Modality Representations via Attention
  Connections
AssembleNet++: Assembling Modality Representations via Attention Connections
Michael S. Ryoo
A. Piergiovanni
Juhana Kangaspunta
A. Angelova
169
50
0
18 Aug 2020
Land Cover Classification from Remote Sensing Images Based on
  Multi-Scale Fully Convolutional Network
Land Cover Classification from Remote Sensing Images Based on Multi-Scale Fully Convolutional NetworkGeo-Spatial Information Science (GSIS), 2020
Rui Li
Shunyi Zheng
Chenxi Duan
Ce Zhang
325
121
0
01 Aug 2020
LEMMA: A Multi-view Dataset for Learning Multi-agent Multi-task
  Activities
LEMMA: A Multi-view Dataset for Learning Multi-agent Multi-task ActivitiesEuropean Conference on Computer Vision (ECCV), 2020
Baoxiong Jia
Yixin Chen
Siyuan Huang
Yixin Zhu
Song-Chun Zhu
146
64
0
31 Jul 2020
Directional Temporal Modeling for Action Recognition
Directional Temporal Modeling for Action Recognition
Xinyu Li
Bing Shuai
Joseph Tighe
123
47
0
21 Jul 2020
Context-Aware RCNN: A Baseline for Action Detection in Videos
Context-Aware RCNN: A Baseline for Action Detection in VideosEuropean Conference on Computer Vision (ECCV), 2020
Jianchao Wu
Zhanghui Kuang
Limin Wang
Wayne Zhang
Gangshan Wu
228
83
0
20 Jul 2020
Knowledge-Based Video Question Answering with Unsupervised Scene
  Descriptions
Knowledge-Based Video Question Answering with Unsupervised Scene DescriptionsEuropean Conference on Computer Vision (ECCV), 2020
Noa Garcia
Yuta Nakashima
250
35
0
17 Jul 2020
Previous
1234567
Next