Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2204.02968
Cited By
Temporal Alignment Networks for Long-term Video
Computer Vision and Pattern Recognition (CVPR), 2022
6 April 2022
Tengda Han
Weidi Xie
Andrew Zisserman
AI4TS
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Temporal Alignment Networks for Long-term Video"
50 / 73 papers shown
EgoThinker: Unveiling Egocentric Reasoning with Spatio-Temporal CoT
Baoqi Pei
Yifei Huang
Jilan Xu
Yuping He
Guo Chen
Fei Wu
Yu Qiao
Jiangmiao Pang
EgoV
LRM
214
4
0
27 Oct 2025
Training-free Online Video Step Grounding
Luca Zanella
Massimiliano Mancini
Yiming Wang
Alessio Tonioni
Elisa Ricci
128
0
0
19 Oct 2025
Learning Human Motion with Temporally Conditional Mamba
Quang Minh Nguyen
T. H. Le
Baoru Huang
M. Vu
Ngan Le
Thieu Vo
Anh Duc Nguyen
Mamba
213
0
0
14 Oct 2025
Effectively obtaining acoustic, visual and textual data from videos
Jorge E. León
Miguel Carrasco
VGen
135
1
0
06 Sep 2025
Attention-Driven Multimodal Alignment for Long-term Action Quality Assessment
Applied Soft Computing (ASC), 2025
Xin Wang
Peng-Jie Li
Yuan-Yuan Shen
141
0
0
29 Jul 2025
SV3.3B: A Sports Video Understanding Model for Action Recognition
Sai Varun Kodathala
Yashwanth Reddy Vutukoori
Rakesh Vunnam
228
2
0
23 Jul 2025
Scene Detection Policies and Keyframe Extraction Strategies for Large-Scale Video Analysis
Vasilii Korolkov
151
1
0
31 May 2025
Clear Nights Ahead: Towards Multi-Weather Nighttime Image Restoration
Yuetong Liu
Yunqiu Xu
Yang Wei
Xiuli Bi
Bin Xiao
293
1
0
22 May 2025
I
2
G
I^2G
I
2
G
: Generating Instructional Illustrations via Text-Conditioned Diffusion
Jing Bi
Pinxin Liu
Ali Vosoughi
Jiarui Wu
Jinxi He
Chenliang Xu
DiffM
227
0
0
22 May 2025
SVLTA: Benchmarking Vision-Language Temporal Alignment via Synthetic Video Situation
Computer Vision and Pattern Recognition (CVPR), 2025
Hao Du
Bo Wu
Yan Lu
Zhendong Mao
242
1
0
08 Apr 2025
Learning Activity View-invariance Under Extreme Viewpoint Changes via Curriculum Knowledge Distillation
Arjun Somayazulu
E. Mavroudi
Changan Chen
Lorenzo Torresani
Kristen Grauman
185
1
0
07 Apr 2025
Stitch-a-Demo: Video Demonstrations from Multistep Descriptions
Chi Hsuan Wu
Kumar Ashutosh
Kristen Grauman
DiffM
282
1
0
18 Mar 2025
Enhancing Explainability with Multimodal Context Representations for Smarter Robots
Anargh Viswanath
Lokesh Veeramacheneni
Hendrik Buschmeier
173
1
0
28 Feb 2025
Hierarchical Banzhaf Interaction for General Video-Language Representation Learning
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Peng Jin
Haoyang Li
Li Yuan
Shuicheng Yan
Jie Chen
395
4
0
31 Dec 2024
Video LLMs for Temporal Reasoning in Long Videos
Fawad Javed Fateh
Umer Ahmed
Hamza Khan
M. Zia
Quoc-Huy Tran
VLM
658
6
0
04 Dec 2024
ACE: Action Concept Enhancement of Video-Language Models in Procedural Videos
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Reza Ghoddoosian
Nakul Agarwal
Isht Dwivedi
Behzad Darisuh
284
0
0
23 Nov 2024
Grounded Video Caption Generation
Evangelos Kazakos
Cordelia Schmid
Josef Sivic
270
0
0
12 Nov 2024
PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance
Ruyang Liu
Haoran Tang
Haibo Liu
Yixiao Ge
Mingyu Ding
Chen Li
Jiankun Yang
VLM
243
17
0
04 Nov 2024
Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment
European Conference on Computer Vision (ECCV), 2024
Yuxiao Chen
Keqin Li
Wentao Bao
Deep Patel
Yu Kong
Martin Renqiang Min
Dimitris N. Metaxas
DiffM
291
5
0
22 Sep 2024
Disentangle and denoise: Tackling context misalignment for video moment retrieval
Kaijing Ma
Han Fang
Xianghao Zang
Chao Ban
Lanxiang Zhou
Zhongjiang He
Yongxiang Li
Hao Sun
Zerun Feng
Xingsong Hou
227
1
0
14 Aug 2024
ExpertAF: Expert Actionable Feedback from Video
Computer Vision and Pattern Recognition (CVPR), 2024
Kumar Ashutosh
Tushar Nagarajan
Georgios Pavlakos
Kris Kitani
Kristen Grauman
VGen
454
10
0
01 Aug 2024
Learning Video Context as Interleaved Multimodal Sequences
S. Shao
Pengchuan Zhang
Y. Li
Xide Xia
A. Meso
Ziteng Gao
Jinheng Xie
N. Holliman
Mike Zheng Shou
243
12
0
31 Jul 2024
Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning
Thong Nguyen
Yi Bin
Xiaobao Wu
Xinshuai Dong
Zhiyuan Hu
Khoi M. Le
Cong-Duy Nguyen
See-Kiong Ng
Luu Anh Tuan
475
10
0
04 Jul 2024
MatchTime: Towards Automatic Soccer Game Commentary Generation
Jiayuan Rao
Haoning Wu
Chang-rui Liu
Yanfeng Wang
Weidi Xie
249
27
0
26 Jun 2024
Multilingual Synopses of Movie Narratives: A Dataset for Story Understanding
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Yidan Sun
Jianfei Yu
Boyang Li
244
0
0
18 Jun 2024
Learning Object States from Actions via Large Language Models
Masatoshi Tateno
Takuma Yagi
Ryosuke Furuta
Yoichi Sato
134
2
0
02 May 2024
Step Differences in Instructional Video
Tushar Nagarajan
Lorenzo Torresani
VGen
423
10
0
24 Apr 2024
AutoAD III: The Prequel -- Back to the Pixels
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
DiffM
311
33
0
22 Apr 2024
LongVLM: Efficient Long Video Understanding via Large Language Models
European Conference on Computer Vision (ECCV), 2024
Yuetian Weng
Mingfei Han
Haoyu He
Xiaojun Chang
Bohan Zhuang
VLM
371
126
0
04 Apr 2024
VidLA: Video-Language Alignment at Scale
Computer Vision and Pattern Recognition (CVPR), 2024
Mamshad Nayeem Rizve
Fan Fei
Jayakrishnan Unnikrishnan
Son Tran
Benjamin Z. Yao
Belinda Zeng
Mubarak Shah
Trishul Chilimbi
VLM
AI4TS
224
8
0
21 Mar 2024
VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding
Yue Fan
Xiaojian Ma
Rujie Wu
Yuntao Du
Jiaqi Li
Zhi Gao
Qing Li
VLM
LLMAG
302
148
0
18 Mar 2024
Video Editing for Video Retrieval
Bin Zhu
Kevin Flanagan
A. Fragomeni
Michael Wray
Dima Damen
CLIP
203
1
0
04 Feb 2024
Multi-granularity Correspondence Learning from Long-term Noisy Videos
Yijie Lin
Jie Zhang
Zhenyu Huang
Jia-Wei Liu
Zujie Wen
Xi Peng
341
35
0
30 Jan 2024
Multi-modal News Understanding with Professionally Labelled Videos (ReutersViLNews)
Shih-Han Chou
Matthew Kowal
Yasmin Niknam
Diana Moyano
Shayaan Mehdi
...
Cheng Zhang
Ian Knopke
S. Kocak
Leonid Sigal
Yalda Mohsenzadeh
233
2
0
23 Jan 2024
Distilling Vision-Language Models on Millions of Videos
Computer Vision and Pattern Recognition (CVPR), 2024
Yue Zhao
Long Zhao
Xingyi Zhou
Jialin Wu
Chun-Te Chu
...
Hartwig Adam
Ting Liu
Boqing Gong
Philipp Krahenbuhl
Liangzhe Yuan
VLM
279
20
0
11 Jan 2024
Detours for Navigating Instructional Videos
Computer Vision and Pattern Recognition (CVPR), 2024
Kumar Ashutosh
Zihui Xue
Tushar Nagarajan
Kristen Grauman
482
7
0
03 Jan 2024
Retrieval-Augmented Egocentric Video Captioning
Computer Vision and Pattern Recognition (CVPR), 2024
Jilan Xu
Yifei Huang
Junlin Hou
Guo Chen
Yue Zhang
Rui Feng
Weidi Xie
EgoV
409
49
0
01 Jan 2024
CaptainCook4D: A dataset for understanding errors in procedural activities
Rohith Peddi
Shivvrat Arya
B. Challa
Likhitha Pallapothula
Akshay Vyas
...
Vasundhara Komaragiri
Eric D. Ragan
Nicholas Ruozzi
Yu Xiang
Vibhav Gogate
269
29
0
22 Dec 2023
A Strong Baseline for Temporal Video-Text Alignment
Zeqian Li
Qirui Chen
Tengda Han
Ya Zhang
Yanfeng Wang
Weidi Xie
AI4TS
VGen
268
11
0
21 Dec 2023
Text-Conditioned Resampler For Long Form Video Understanding
Bruno Korbar
Yongqin Xian
A. Tonioni
Andrew Zisserman
Federico Tombari
305
23
0
19 Dec 2023
Learning Object State Changes in Videos: An Open-World Perspective
Zihui Xue
Kumar Ashutosh
Kristen Grauman
VGen
341
33
0
19 Dec 2023
Collaborative Weakly Supervised Video Correlation Learning for Procedure-Aware Instructional Video Analysis
Tianyao He
Huabin Liu
Yuxi Li
Xiao Ma
Cheng Zhong
Yang Zhang
Weiyao Lin
305
7
0
18 Dec 2023
GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos
Computer Vision and Pattern Recognition (CVPR), 2023
Tomávs Souvcek
Dima Damen
Michael Wray
Ivan Laptev
Josef Sivic
VGen
257
39
0
12 Dec 2023
LvBench: A Benchmark for Long-form Video Understanding with Versatile Multi-modal Question Answering
Hongjie Zhang
Lu Dong
Yi Liu
Yifei Huang
Z. Ling
Yali Wang
Limin Wang
329
32
0
08 Dec 2023
Efficient Pre-training for Localized Instruction Generation of Videos
Anil Batra
Davide Moltisanti
Laura Sevilla-Lara
Marcus Rohrbach
Frank Keller
374
0
0
27 Nov 2023
Mug-STAN: Adapting Image-Language Pretrained Models for General Video Understanding
Ruyang Liu
Jingjia Huang
Wei-Nan Gao
Thomas H. Li
Ge Li
VLM
267
4
0
25 Nov 2023
HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
European Conference on Computer Vision (ECCV), 2023
Nina Shvetsova
Anna Kukleva
Xudong Hong
Christian Rupprecht
Bernt Schiele
Hilde Kuehne
297
31
0
07 Oct 2023
VidChapters-7M: Video Chapters at Scale
Neural Information Processing Systems (NeurIPS), 2023
Antoine Yang
Arsha Nagrani
Ivan Laptev
Josef Sivic
Cordelia Schmid
VGen
246
39
0
25 Sep 2023
Spatial-Temporal Knowledge-Embedded Transformer for Video Scene Graph Generation
IEEE Transactions on Image Processing (IEEE TIP), 2023
Tao Pu
Tianshui Chen
Hefeng Wu
Yongyi Lu
Liangjie Lin
ViT
309
17
0
23 Sep 2023
In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval
IEEE International Conference on Computer Vision (ICCV), 2023
Nina Shvetsova
Anna Kukleva
Bernt Schiele
Hilde Kuehne
DiffM
229
6
0
16 Sep 2023
1
2
Next