ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.02968
  4. Cited By
Temporal Alignment Networks for Long-term Video

Temporal Alignment Networks for Long-term Video

Computer Vision and Pattern Recognition (CVPR), 2022
6 April 2022
Tengda Han
Weidi Xie
Andrew Zisserman
    AI4TS
ArXiv (abs)PDFHTML

Papers citing "Temporal Alignment Networks for Long-term Video"

50 / 73 papers shown
EgoThinker: Unveiling Egocentric Reasoning with Spatio-Temporal CoT
EgoThinker: Unveiling Egocentric Reasoning with Spatio-Temporal CoT
Baoqi Pei
Yifei Huang
Jilan Xu
Yuping He
Guo Chen
Fei Wu
Yu Qiao
Jiangmiao Pang
EgoVLRM
214
4
0
27 Oct 2025
Training-free Online Video Step Grounding
Training-free Online Video Step Grounding
Luca Zanella
Massimiliano Mancini
Yiming Wang
Alessio Tonioni
Elisa Ricci
128
0
0
19 Oct 2025
Learning Human Motion with Temporally Conditional Mamba
Learning Human Motion with Temporally Conditional Mamba
Quang Minh Nguyen
T. H. Le
Baoru Huang
M. Vu
Ngan Le
Thieu Vo
Anh Duc Nguyen
Mamba
213
0
0
14 Oct 2025
Effectively obtaining acoustic, visual and textual data from videos
Effectively obtaining acoustic, visual and textual data from videos
Jorge E. León
Miguel Carrasco
VGen
135
1
0
06 Sep 2025
Attention-Driven Multimodal Alignment for Long-term Action Quality Assessment
Attention-Driven Multimodal Alignment for Long-term Action Quality AssessmentApplied Soft Computing (ASC), 2025
Xin Wang
Peng-Jie Li
Yuan-Yuan Shen
141
0
0
29 Jul 2025
SV3.3B: A Sports Video Understanding Model for Action Recognition
SV3.3B: A Sports Video Understanding Model for Action Recognition
Sai Varun Kodathala
Yashwanth Reddy Vutukoori
Rakesh Vunnam
228
2
0
23 Jul 2025
Scene Detection Policies and Keyframe Extraction Strategies for Large-Scale Video Analysis
Scene Detection Policies and Keyframe Extraction Strategies for Large-Scale Video Analysis
Vasilii Korolkov
151
1
0
31 May 2025
Clear Nights Ahead: Towards Multi-Weather Nighttime Image Restoration
Clear Nights Ahead: Towards Multi-Weather Nighttime Image Restoration
Yuetong Liu
Yunqiu Xu
Yang Wei
Xiuli Bi
Bin Xiao
293
1
0
22 May 2025
$I^2G$: Generating Instructional Illustrations via Text-Conditioned Diffusion
I2GI^2GI2G: Generating Instructional Illustrations via Text-Conditioned Diffusion
Jing Bi
Pinxin Liu
Ali Vosoughi
Jiarui Wu
Jinxi He
Chenliang Xu
DiffM
227
0
0
22 May 2025
SVLTA: Benchmarking Vision-Language Temporal Alignment via Synthetic Video Situation
SVLTA: Benchmarking Vision-Language Temporal Alignment via Synthetic Video SituationComputer Vision and Pattern Recognition (CVPR), 2025
Hao Du
Bo Wu
Yan Lu
Zhendong Mao
242
1
0
08 Apr 2025
Learning Activity View-invariance Under Extreme Viewpoint Changes via Curriculum Knowledge Distillation
Learning Activity View-invariance Under Extreme Viewpoint Changes via Curriculum Knowledge Distillation
Arjun Somayazulu
E. Mavroudi
Changan Chen
Lorenzo Torresani
Kristen Grauman
185
1
0
07 Apr 2025
Stitch-a-Demo: Video Demonstrations from Multistep Descriptions
Stitch-a-Demo: Video Demonstrations from Multistep Descriptions
Chi Hsuan Wu
Kumar Ashutosh
Kristen Grauman
DiffM
282
1
0
18 Mar 2025
Enhancing Explainability with Multimodal Context Representations for Smarter Robots
Enhancing Explainability with Multimodal Context Representations for Smarter Robots
Anargh Viswanath
Lokesh Veeramacheneni
Hendrik Buschmeier
173
1
0
28 Feb 2025
Hierarchical Banzhaf Interaction for General Video-Language Representation Learning
Hierarchical Banzhaf Interaction for General Video-Language Representation LearningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Peng Jin
Haoyang Li
Li Yuan
Shuicheng Yan
Jie Chen
395
4
0
31 Dec 2024
Video LLMs for Temporal Reasoning in Long Videos
Video LLMs for Temporal Reasoning in Long Videos
Fawad Javed Fateh
Umer Ahmed
Hamza Khan
M. Zia
Quoc-Huy Tran
VLM
658
6
0
04 Dec 2024
ACE: Action Concept Enhancement of Video-Language Models in Procedural
  Videos
ACE: Action Concept Enhancement of Video-Language Models in Procedural VideosIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Reza Ghoddoosian
Nakul Agarwal
Isht Dwivedi
Behzad Darisuh
284
0
0
23 Nov 2024
Grounded Video Caption Generation
Grounded Video Caption Generation
Evangelos Kazakos
Cordelia Schmid
Josef Sivic
270
0
0
12 Nov 2024
PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance
PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance
Ruyang Liu
Haoran Tang
Haibo Liu
Yixiao Ge
Mingyu Ding
Chen Li
Jiankun Yang
VLM
243
17
0
04 Nov 2024
Learning to Localize Actions in Instructional Videos with LLM-Based
  Multi-Pathway Text-Video Alignment
Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video AlignmentEuropean Conference on Computer Vision (ECCV), 2024
Yuxiao Chen
Keqin Li
Wentao Bao
Deep Patel
Yu Kong
Martin Renqiang Min
Dimitris N. Metaxas
DiffM
291
5
0
22 Sep 2024
Disentangle and denoise: Tackling context misalignment for video moment
  retrieval
Disentangle and denoise: Tackling context misalignment for video moment retrieval
Kaijing Ma
Han Fang
Xianghao Zang
Chao Ban
Lanxiang Zhou
Zhongjiang He
Yongxiang Li
Hao Sun
Zerun Feng
Xingsong Hou
227
1
0
14 Aug 2024
ExpertAF: Expert Actionable Feedback from Video
ExpertAF: Expert Actionable Feedback from VideoComputer Vision and Pattern Recognition (CVPR), 2024
Kumar Ashutosh
Tushar Nagarajan
Georgios Pavlakos
Kris Kitani
Kristen Grauman
VGen
454
10
0
01 Aug 2024
Learning Video Context as Interleaved Multimodal Sequences
Learning Video Context as Interleaved Multimodal Sequences
S. Shao
Pengchuan Zhang
Y. Li
Xide Xia
A. Meso
Ziteng Gao
Jinheng Xie
N. Holliman
Mike Zheng Shou
243
12
0
31 Jul 2024
Meta-optimized Angular Margin Contrastive Framework for Video-Language
  Representation Learning
Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning
Thong Nguyen
Yi Bin
Xiaobao Wu
Xinshuai Dong
Zhiyuan Hu
Khoi M. Le
Cong-Duy Nguyen
See-Kiong Ng
Luu Anh Tuan
475
10
0
04 Jul 2024
MatchTime: Towards Automatic Soccer Game Commentary Generation
MatchTime: Towards Automatic Soccer Game Commentary Generation
Jiayuan Rao
Haoning Wu
Chang-rui Liu
Yanfeng Wang
Weidi Xie
249
27
0
26 Jun 2024
Multilingual Synopses of Movie Narratives: A Dataset for Story
  Understanding
Multilingual Synopses of Movie Narratives: A Dataset for Story UnderstandingConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Yidan Sun
Jianfei Yu
Boyang Li
244
0
0
18 Jun 2024
Learning Object States from Actions via Large Language Models
Learning Object States from Actions via Large Language Models
Masatoshi Tateno
Takuma Yagi
Ryosuke Furuta
Yoichi Sato
134
2
0
02 May 2024
Step Differences in Instructional Video
Step Differences in Instructional Video
Tushar Nagarajan
Lorenzo Torresani
VGen
423
10
0
24 Apr 2024
AutoAD III: The Prequel -- Back to the Pixels
AutoAD III: The Prequel -- Back to the Pixels
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGenDiffM
311
33
0
22 Apr 2024
LongVLM: Efficient Long Video Understanding via Large Language Models
LongVLM: Efficient Long Video Understanding via Large Language ModelsEuropean Conference on Computer Vision (ECCV), 2024
Yuetian Weng
Mingfei Han
Haoyu He
Xiaojun Chang
Bohan Zhuang
VLM
371
126
0
04 Apr 2024
VidLA: Video-Language Alignment at Scale
VidLA: Video-Language Alignment at ScaleComputer Vision and Pattern Recognition (CVPR), 2024
Mamshad Nayeem Rizve
Fan Fei
Jayakrishnan Unnikrishnan
Son Tran
Benjamin Z. Yao
Belinda Zeng
Mubarak Shah
Trishul Chilimbi
VLMAI4TS
224
8
0
21 Mar 2024
VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding
VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding
Yue Fan
Xiaojian Ma
Rujie Wu
Yuntao Du
Jiaqi Li
Zhi Gao
Qing Li
VLMLLMAG
302
148
0
18 Mar 2024
Video Editing for Video Retrieval
Video Editing for Video Retrieval
Bin Zhu
Kevin Flanagan
A. Fragomeni
Michael Wray
Dima Damen
CLIP
203
1
0
04 Feb 2024
Multi-granularity Correspondence Learning from Long-term Noisy Videos
Multi-granularity Correspondence Learning from Long-term Noisy Videos
Yijie Lin
Jie Zhang
Zhenyu Huang
Jia-Wei Liu
Zujie Wen
Xi Peng
341
35
0
30 Jan 2024
Multi-modal News Understanding with Professionally Labelled Videos
  (ReutersViLNews)
Multi-modal News Understanding with Professionally Labelled Videos (ReutersViLNews)
Shih-Han Chou
Matthew Kowal
Yasmin Niknam
Diana Moyano
Shayaan Mehdi
...
Cheng Zhang
Ian Knopke
S. Kocak
Leonid Sigal
Yalda Mohsenzadeh
233
2
0
23 Jan 2024
Distilling Vision-Language Models on Millions of Videos
Distilling Vision-Language Models on Millions of VideosComputer Vision and Pattern Recognition (CVPR), 2024
Yue Zhao
Long Zhao
Xingyi Zhou
Jialin Wu
Chun-Te Chu
...
Hartwig Adam
Ting Liu
Boqing Gong
Philipp Krahenbuhl
Liangzhe Yuan
VLM
279
20
0
11 Jan 2024
Detours for Navigating Instructional Videos
Detours for Navigating Instructional VideosComputer Vision and Pattern Recognition (CVPR), 2024
Kumar Ashutosh
Zihui Xue
Tushar Nagarajan
Kristen Grauman
482
7
0
03 Jan 2024
Retrieval-Augmented Egocentric Video Captioning
Retrieval-Augmented Egocentric Video CaptioningComputer Vision and Pattern Recognition (CVPR), 2024
Jilan Xu
Yifei Huang
Junlin Hou
Guo Chen
Yue Zhang
Rui Feng
Weidi Xie
EgoV
409
49
0
01 Jan 2024
CaptainCook4D: A dataset for understanding errors in procedural
  activities
CaptainCook4D: A dataset for understanding errors in procedural activities
Rohith Peddi
Shivvrat Arya
B. Challa
Likhitha Pallapothula
Akshay Vyas
...
Vasundhara Komaragiri
Eric D. Ragan
Nicholas Ruozzi
Yu Xiang
Vibhav Gogate
269
29
0
22 Dec 2023
A Strong Baseline for Temporal Video-Text Alignment
A Strong Baseline for Temporal Video-Text Alignment
Zeqian Li
Qirui Chen
Tengda Han
Ya Zhang
Yanfeng Wang
Weidi Xie
AI4TSVGen
268
11
0
21 Dec 2023
Text-Conditioned Resampler For Long Form Video Understanding
Text-Conditioned Resampler For Long Form Video Understanding
Bruno Korbar
Yongqin Xian
A. Tonioni
Andrew Zisserman
Federico Tombari
305
23
0
19 Dec 2023
Learning Object State Changes in Videos: An Open-World Perspective
Learning Object State Changes in Videos: An Open-World Perspective
Zihui Xue
Kumar Ashutosh
Kristen Grauman
VGen
341
33
0
19 Dec 2023
Collaborative Weakly Supervised Video Correlation Learning for
  Procedure-Aware Instructional Video Analysis
Collaborative Weakly Supervised Video Correlation Learning for Procedure-Aware Instructional Video Analysis
Tianyao He
Huabin Liu
Yuxi Li
Xiao Ma
Cheng Zhong
Yang Zhang
Weiyao Lin
305
7
0
18 Dec 2023
GenHowTo: Learning to Generate Actions and State Transformations from
  Instructional Videos
GenHowTo: Learning to Generate Actions and State Transformations from Instructional VideosComputer Vision and Pattern Recognition (CVPR), 2023
Tomávs Souvcek
Dima Damen
Michael Wray
Ivan Laptev
Josef Sivic
VGen
257
39
0
12 Dec 2023
LvBench: A Benchmark for Long-form Video Understanding with Versatile Multi-modal Question Answering
LvBench: A Benchmark for Long-form Video Understanding with Versatile Multi-modal Question Answering
Hongjie Zhang
Lu Dong
Yi Liu
Yifei Huang
Z. Ling
Yali Wang
Limin Wang
329
32
0
08 Dec 2023
Efficient Pre-training for Localized Instruction Generation of Videos
Efficient Pre-training for Localized Instruction Generation of Videos
Anil Batra
Davide Moltisanti
Laura Sevilla-Lara
Marcus Rohrbach
Frank Keller
374
0
0
27 Nov 2023
Mug-STAN: Adapting Image-Language Pretrained Models for General Video
  Understanding
Mug-STAN: Adapting Image-Language Pretrained Models for General Video Understanding
Ruyang Liu
Jingjia Huang
Wei-Nan Gao
Thomas H. Li
Ge Li
VLM
267
4
0
25 Nov 2023
HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
HowToCaption: Prompting LLMs to Transform Video Annotations at ScaleEuropean Conference on Computer Vision (ECCV), 2023
Nina Shvetsova
Anna Kukleva
Xudong Hong
Christian Rupprecht
Bernt Schiele
Hilde Kuehne
297
31
0
07 Oct 2023
VidChapters-7M: Video Chapters at Scale
VidChapters-7M: Video Chapters at ScaleNeural Information Processing Systems (NeurIPS), 2023
Antoine Yang
Arsha Nagrani
Ivan Laptev
Josef Sivic
Cordelia Schmid
VGen
246
39
0
25 Sep 2023
Spatial-Temporal Knowledge-Embedded Transformer for Video Scene Graph
  Generation
Spatial-Temporal Knowledge-Embedded Transformer for Video Scene Graph GenerationIEEE Transactions on Image Processing (IEEE TIP), 2023
Tao Pu
Tianshui Chen
Hefeng Wu
Yongyi Lu
Liangjie Lin
ViT
309
17
0
23 Sep 2023
In-Style: Bridging Text and Uncurated Videos with Style Transfer for
  Text-Video Retrieval
In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video RetrievalIEEE International Conference on Computer Vision (ICCV), 2023
Nina Shvetsova
Anna Kukleva
Bernt Schiele
Hilde Kuehne
DiffM
229
6
0
16 Sep 2023
12
Next