ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1812.05038
  4. Cited By
Long-Term Feature Banks for Detailed Video Understanding
v1v2 (latest)

Long-Term Feature Banks for Detailed Video Understanding

12 December 2018
Chao-Yuan Wu
Christoph Feichtenhofer
Haoqi Fan
Kaiming He
Philipp Krahenbuhl
Ross B. Girshick
ArXiv (abs)PDFHTML

Papers citing "Long-Term Feature Banks for Detailed Video Understanding"

50 / 315 papers shown
Multi-Task Learning of Object State Changes from Uncurated Videos
Multi-Task Learning of Object State Changes from Uncurated Videos
Tomávs Souvcek
Jean-Baptiste Alayrac
Antoine Miech
Ivan Laptev
Josef Sivic
194
13
0
24 Nov 2022
Discovering A Variety of Objects in Spatio-Temporal Human-Object
  Interactions
Discovering A Variety of Objects in Spatio-Temporal Human-Object Interactions
Yong-Lu Li
Hongwei Fan
Zuoyu Qiu
Yiming Dou
Liang Xu
...
Peiyang Guo
Haisheng Su
Dongliang Wang
Wei Wu
Cewu Lu
195
8
0
14 Nov 2022
End-to-end Transformer for Compressed Video Quality Enhancement
End-to-end Transformer for Compressed Video Quality EnhancementIEEE transactions on broadcasting (IEEE Trans. Broadcast.), 2022
Li Yu
Wenshuai Chang
Shiyu Wu
Moncef Gabbouj
ViT
193
17
0
25 Oct 2022
Holistic Interaction Transformer Network for Action Detection
Holistic Interaction Transformer Network for Action DetectionIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Gueter Josmy Faure
Min-Hung Chen
S. Lai
291
48
0
23 Oct 2022
YOWO-Plus: An Incremental Improvement
YOWO-Plus: An Incremental Improvement
Jianhua Yang
ViT
130
5
0
20 Oct 2022
Grounded Video Situation Recognition
Grounded Video Situation RecognitionNeural Information Processing Systems (NeurIPS), 2022
Zeeshan Khan
C. V. Jawahar
Makarand Tapaswi
190
16
0
19 Oct 2022
Long-Form Video-Language Pre-Training with Multimodal Temporal
  Contrastive Learning
Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive LearningNeural Information Processing Systems (NeurIPS), 2022
Yuchong Sun
Hongwei Xue
Ruihua Song
Bei Liu
Huan Yang
Jianlong Fu
AI4TSVLM
276
84
0
12 Oct 2022
ConTra: (Con)text (Tra)nsformer for Cross-Modal Video Retrieval
ConTra: (Con)text (Tra)nsformer for Cross-Modal Video RetrievalAsian Conference on Computer Vision (ACCV), 2022
A. Fragomeni
Michael Wray
Dima Damen
CLIPViT
145
4
0
09 Oct 2022
Compressed Vision for Efficient Video Understanding
Compressed Vision for Efficient Video UnderstandingAsian Conference on Computer Vision (ACCV), 2022
Olivia Wiles
João Carreira
Iain Barr
Andrew Zisserman
Mateusz Malinowski
119
10
0
06 Oct 2022
COPILOT: Human-Environment Collision Prediction and Localization from
  Egocentric Videos
COPILOT: Human-Environment Collision Prediction and Localization from Egocentric VideosIEEE International Conference on Computer Vision (ICCV), 2022
Boxiao Pan
Bokui Shen
Davis Rempe
Despoina Paschalidou
Kaichun Mo
Yanchao Yang
Leonidas Guibas
149
3
0
04 Oct 2022
Exploiting Instance-based Mixed Sampling via Auxiliary Source Domain
  Supervision for Domain-adaptive Action Detection
Exploiting Instance-based Mixed Sampling via Auxiliary Source Domain Supervision for Domain-adaptive Action DetectionIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Yifan Lu
Gurkirt Singh
Suman Saha
Luc Van Gool
TTA
336
3
0
28 Sep 2022
Visual Object Tracking in First Person Vision
Visual Object Tracking in First Person VisionInternational Journal of Computer Vision (IJCV), 2022
Matteo Dunnhofer
Antonino Furnari
G. Farinella
C. Micheloni
235
41
0
27 Sep 2022
CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video
  Temporal Grounding
CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video Temporal GroundingAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Zhijian Hou
Wanjun Zhong
Lei Ji
Difei Gao
Kun Yan
W. Chan
Chong-Wah Ngo
Zheng Shou
Nan Duan
AI4TS
253
34
0
22 Sep 2022
MCIBI++: Soft Mining Contextual Information Beyond Image for Semantic
  Segmentation
MCIBI++: Soft Mining Contextual Information Beyond Image for Semantic SegmentationIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Zhenchao Jin
Dongdong Yu
Zehuan Yuan
Lequan Yu
370
25
0
09 Sep 2022
Spatio-Temporal Action Detection Under Large Motion
Spatio-Temporal Action Detection Under Large MotionIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Gurkirt Singh
Vasileios Choutas
Suman Saha
Feng Yu
Luc Van Gool
267
15
0
06 Sep 2022
A comprehensive survey on recent deep learning-based methods applied to
  surgical data
A comprehensive survey on recent deep learning-based methods applied to surgical data
Mansoor Ali
Rafael Martinez Garcia Peña
Gilberto Ochoa-Ruiz
Sharib Ali
419
7
0
03 Sep 2022
Dynamic Spatio-Temporal Specialization Learning for Fine-Grained Action
  Recognition
Dynamic Spatio-Temporal Specialization Learning for Fine-Grained Action RecognitionEuropean Conference on Computer Vision (ECCV), 2022
Tianjiao Li
Lin Geng Foo
Qiuhong Ke
Hossein Rahmani
Anran Wang
Jinghua Wang
Jing Liu
197
30
0
03 Sep 2022
A Circular Window-based Cascade Transformer for Online Action Detection
A Circular Window-based Cascade Transformer for Online Action Detection
Shuyuan Cao
Weihua Luo
Bairui Wang
Wei Emma Zhang
Lin Ma
191
6
0
30 Aug 2022
Identifying Auxiliary or Adversarial Tasks Using Necessary Condition
  Analysis for Adversarial Multi-task Video Understanding
Identifying Auxiliary or Adversarial Tasks Using Necessary Condition Analysis for Adversarial Multi-task Video Understanding
Stephen Su
Sam Kwong
Qingyu Zhao
De-An Huang
Juan Carlos Niebles
Ehsan Adeli
175
0
0
22 Aug 2022
EgoEnv: Human-centric environment representations from egocentric video
EgoEnv: Human-centric environment representations from egocentric videoNeural Information Processing Systems (NeurIPS), 2022
Tushar Nagarajan
Santhosh Kumar Ramakrishnan
Ruta Desai
James M. Hillis
Kristen Grauman
EgoV
296
25
0
22 Jul 2022
Is an Object-Centric Video Representation Beneficial for Transfer?
Is an Object-Centric Video Representation Beneficial for Transfer?Asian Conference on Computer Vision (ACCV), 2022
Chuhan Zhang
Ankush Gupta
Andrew Zisserman
ViT
346
31
0
20 Jul 2022
ViGAT: Bottom-up event recognition and explanation in video using
  factorized graph attention network
ViGAT: Bottom-up event recognition and explanation in video using factorized graph attention networkIEEE Access (IEEE Access), 2022
Nikolaos Gkalelis
Dimitrios Daskalakis
Vasileios Mezaris
204
12
0
20 Jul 2022
Learning Sequence Representations by Non-local Recurrent Neural Memory
Learning Sequence Representations by Non-local Recurrent Neural MemoryInternational Journal of Computer Vision (IJCV), 2022
Wenjie Pei
Xin Feng
Canmiao Fu
Qi Cao
Guangming Lu
Yu-Wing Tai
AI4TS
295
0
0
20 Jul 2022
Learning from Label Relationships in Human Affect
Learning from Label Relationships in Human AffectACM Multimedia (ACM MM), 2022
Niki Maria Foteinopoulou
Ioannis Patras
CVBM
189
14
0
12 Jul 2022
Beyond Transfer Learning: Co-finetuning for Action Localisation
Beyond Transfer Learning: Co-finetuning for Action Localisation
Anurag Arnab
Xuehan Xiong
A. Gritsenko
Rob Romijnders
Josip Djolonga
Mostafa Dehghani
Chen Sun
Mario Lucic
Cordelia Schmid
262
10
0
08 Jul 2022
Explore Spatio-temporal Aggregation for Insubstantial Object Detection:
  Benchmark Dataset and Baseline
Explore Spatio-temporal Aggregation for Insubstantial Object Detection: Benchmark Dataset and BaselineComputer Vision and Pattern Recognition (CVPR), 2022
Kailai Zhou
Yibo Wang
Tao Lv
Yunqian Li
Linsen Chen
Qiu Shen
Xun Cao
206
18
0
23 Jun 2022
One-stage Action Detection Transformer
One-stage Action Detection Transformer
Lijun Li
Lian Zhuo
Bangyin Zhang
ViT
112
0
0
21 Jun 2022
It's Time for Artistic Correspondence in Music and Video
It's Time for Artistic Correspondence in Music and VideoComputer Vision and Pattern Recognition (CVPR), 2022
Dídac Surís
Carl Vondrick
Bryan C. Russell
Justin Salamon
151
42
0
14 Jun 2022
A Simple and Efficient Pipeline to Build an End-to-End Spatial-Temporal
  Action Detector
A Simple and Efficient Pipeline to Build an End-to-End Spatial-Temporal Action DetectorIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Lin Sui
Chen-Da Liu-Zhang
Lixin Gu
Feng Han
273
13
0
07 Jun 2022
Revisiting the "Video" in Video-Language Understanding
Revisiting the "Video" in Video-Language UnderstandingComputer Vision and Pattern Recognition (CVPR), 2022
S. Buch
Cristobal Eyzaguirre
Adrien Gaidon
Jiajun Wu
L. Fei-Fei
Juan Carlos Niebles
213
202
0
03 Jun 2022
A CLIP-Hitchhiker's Guide to Long Video Retrieval
A CLIP-Hitchhiker's Guide to Long Video Retrieval
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
CLIP
418
73
0
17 May 2022
Retrieval-Enhanced Machine Learning
Retrieval-Enhanced Machine LearningAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2022
Hamed Zamani
Fernando Diaz
Mostafa Dehghani
Donald Metzler
Michael Bendersky
165
59
0
02 May 2022
The Wisdom of Crowds: Temporal Progressive Attention for Early Action
  Prediction
The Wisdom of Crowds: Temporal Progressive Attention for Early Action PredictionComputer Vision and Pattern Recognition (CVPR), 2022
Alexandros Stergiou
Dima Damen
AI4TSEgoVEDL
171
14
0
28 Apr 2022
Temporal Relevance Analysis for Video Action Models
Temporal Relevance Analysis for Video Action Models
Quanfu Fan
Donghyun Kim
Chun-Fu Chen
Chen
Stan Sclaroff
Kate Saenko
Sarah Adel Bargal
FAtt
165
1
0
25 Apr 2022
A Multi-Person Video Dataset Annotation Method of Spatio-Temporally
  Actions
A Multi-Person Video Dataset Annotation Method of Spatio-Temporally Actions
Fan Yang
240
6
0
21 Apr 2022
THORN: Temporal Human-Object Relation Network for Action Recognition
THORN: Temporal Human-Object Relation Network for Action RecognitionInternational Conference on Pattern Recognition (ICPR), 2022
Mohammed Guermal
Rui Dai
Francois Bremond
EgoV
169
3
0
20 Apr 2022
LaMemo: Language Modeling with Look-Ahead Memory
LaMemo: Language Modeling with Look-Ahead MemoryNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022
Haozhe Ji
Rongsheng Zhang
Zhenyu Yang
Zhipeng Hu
Shiyu Huang
KELMRALMCLL
164
4
0
15 Apr 2022
SOS! Self-supervised Learning Over Sets Of Handled Objects In Egocentric
  Action Recognition
SOS! Self-supervised Learning Over Sets Of Handled Objects In Egocentric Action RecognitionEuropean Conference on Computer Vision (ECCV), 2022
Victor Escorcia
Ricardo Guerrero
Xiatian Zhu
Brais Martínez
EgoV
228
12
0
10 Apr 2022
E^2TAD: An Energy-Efficient Tracking-based Action Detector
E^2TAD: An Energy-Efficient Tracking-based Action Detector
Xin Hu
Zhenyu Wu
Haoyuan Miao
Siqi Fan
Taiyu Long
...
Pengcheng Pi
Yi Wu
Zhou Ren
Zinan Lin
G. Hua
432
2
0
09 Apr 2022
Hierarchical Self-supervised Representation Learning for Movie
  Understanding
Hierarchical Self-supervised Representation Learning for Movie UnderstandingComputer Vision and Pattern Recognition (CVPR), 2022
Fanyi Xiao
Kaustav Kundu
Joseph Tighe
Davide Modolo
SSL
198
27
0
06 Apr 2022
Learning from Untrimmed Videos: Self-Supervised Video Representation
  Learning with Hierarchical Consistency
Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical ConsistencyComputer Vision and Pattern Recognition (CVPR), 2022
Zhiwu Qing
Shiwei Zhang
Ziyuan Huang
Yi Tian Xu
Xiang Wang
Mingqian Tang
Changxin Gao
Rong Jin
Nong Sang
SSLAI4TS
242
18
0
06 Apr 2022
TALLFormer: Temporal Action Localization with a Long-memory Transformer
TALLFormer: Temporal Action Localization with a Long-memory TransformerEuropean Conference on Computer Vision (ECCV), 2022
Feng Cheng
Gedas Bertasius
ViT
321
119
0
04 Apr 2022
Exploiting Temporal Relations on Radar Perception for Autonomous Driving
Exploiting Temporal Relations on Radar Perception for Autonomous DrivingComputer Vision and Pattern Recognition (CVPR), 2022
Peizhao Li
Puzuo Wang
K. Berntorp
Hongfu Liu
276
50
0
03 Apr 2022
A-ACT: Action Anticipation through Cycle Transformations
A-ACT: Action Anticipation through Cycle Transformations
Akash Gupta
Jingen Liu
Liefeng Bo
Amit K. Roy-Chowdhury
Tao Mei
209
7
0
02 Apr 2022
MeMOT: Multi-Object Tracking with Memory
MeMOT: Multi-Object Tracking with MemoryComputer Vision and Pattern Recognition (CVPR), 2022
Jiarui Cai
Mingze Xu
Wei Li
Yuanjun Xiong
Wei Xia
Zhuowen Tu
Stefano Soatto
VOT
321
213
0
31 Mar 2022
Stochastic Backpropagation: A Memory Efficient Strategy for Training
  Video Models
Stochastic Backpropagation: A Memory Efficient Strategy for Training Video ModelsComputer Vision and Pattern Recognition (CVPR), 2022
Feng Cheng
Ming Xu
Yuanjun Xiong
Hao Chen
Xinyu Li
Wei Li
Wei Xia
137
18
0
31 Mar 2022
Global Tracking Transformers
Global Tracking TransformersComputer Vision and Pattern Recognition (CVPR), 2022
Xingyi Zhou
Tianwei Yin
V. Koltun
Philipp Krahenbuhl
VOT
282
174
0
24 Mar 2022
Point3D: tracking actions as moving points with 3D CNNs
Point3D: tracking actions as moving points with 3D CNNsBritish Machine Vision Conference (BMVC), 2022
Shentong Mo
Jingfei Xia
Xiaoqing Ellen Tan
Bhiksha Raj
3DPC
252
5
0
20 Mar 2022
Local-Global Context Aware Transformer for Language-Guided Video
  Segmentation
Local-Global Context Aware Transformer for Language-Guided Video SegmentationIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Chen Liang
Wenguan Wang
Tianfei Zhou
Jiaxu Miao
Yawei Luo
Yi Yang
VOS
322
101
0
18 Mar 2022
Gate-Shift-Fuse for Video Action Recognition
Gate-Shift-Fuse for Video Action RecognitionIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Swathikiran Sudhakaran
Sergio Escalera
Oswald Lanz
267
33
0
16 Mar 2022
Previous
1234567
Next