Temporal Alignment Networks for Long-term Video

Computer Vision and Pattern Recognition (CVPR), 2022

6 April 2022

Papers citing "Temporal Alignment Networks for Long-term Video"

50 / 73 papers shown

EgoThinker: Unveiling Egocentric Reasoning with Spatio-Temporal CoT

214

27 Oct 2025

Training-free Online Video Step Grounding

128

19 Oct 2025

Learning Human Motion with Temporally Conditional Mamba

213

14 Oct 2025

Effectively obtaining acoustic, visual and textual data from videos

Jorge E. León

Miguel Carrasco

VGen

135

06 Sep 2025

Attention-Driven Multimodal Alignment for Long-term Action Quality AssessmentApplied Soft Computing (ASC), 2025

Xin Wang

Peng-Jie Li

Yuan-Yuan Shen

141

29 Jul 2025

SV3.3B: A Sports Video Understanding Model for Action Recognition

Sai Varun Kodathala

Yashwanth Reddy Vutukoori

Rakesh Vunnam

228

23 Jul 2025

Scene Detection Policies and Keyframe Extraction Strategies for Large-Scale Video Analysis

Vasilii Korolkov

151

31 May 2025

Clear Nights Ahead: Towards Multi-Weather Nighttime Image Restoration

293

22 May 2025

I^2G

: Generating Instructional Illustrations via Text-Conditioned Diffusion

227

22 May 2025

SVLTA: Benchmarking Vision-Language Temporal Alignment via Synthetic Video SituationComputer Vision and Pattern Recognition (CVPR), 2025

242

08 Apr 2025

Learning Activity View-invariance Under Extreme Viewpoint Changes via Curriculum Knowledge Distillation

185

07 Apr 2025

Stitch-a-Demo: Video Demonstrations from Multistep Descriptions

282

18 Mar 2025

Enhancing Explainability with Multimodal Context Representations for Smarter Robots

Anargh Viswanath

Lokesh Veeramacheneni

Hendrik Buschmeier

173

28 Feb 2025

Hierarchical Banzhaf Interaction for General Video-Language Representation LearningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024

395

31 Dec 2024

Video LLMs for Temporal Reasoning in Long Videos

658

04 Dec 2024

ACE: Action Concept Enhancement of Video-Language Models in Procedural VideosIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024

284

23 Nov 2024

Grounded Video Caption Generation

Evangelos Kazakos

Cordelia Schmid

Josef Sivic

270

12 Nov 2024

PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance

243

04 Nov 2024

Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video AlignmentEuropean Conference on Computer Vision (ECCV), 2024

Yu Kong

Martin Renqiang Min

Dimitris N. Metaxas

DiffM

291

22 Sep 2024

Disentangle and denoise: Tackling context misalignment for video moment retrieval

Yongxiang Li

227

14 Aug 2024

ExpertAF: Expert Actionable Feedback from VideoComputer Vision and Pattern Recognition (CVPR), 2024

454

01 Aug 2024

Learning Video Context as Interleaved Multimodal Sequences

243

31 Jul 2024

Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning

See-Kiong Ng

Luu Anh Tuan

475

04 Jul 2024

MatchTime: Towards Automatic Soccer Game Commentary Generation

Yanfeng Wang

249

26 Jun 2024

Multilingual Synopses of Movie Narratives: A Dataset for Story UnderstandingConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Yidan Sun

Jianfei Yu

Boyang Li

244

18 Jun 2024

Learning Object States from Actions via Large Language Models

Masatoshi Tateno

Takuma Yagi

Ryosuke Furuta

Yoichi Sato

134

02 May 2024

Step Differences in Instructional Video

Tushar Nagarajan

Lorenzo Torresani

VGen

423

24 Apr 2024

AutoAD III: The Prequel -- Back to the Pixels

311

22 Apr 2024

LongVLM: Efficient Long Video Understanding via Large Language ModelsEuropean Conference on Computer Vision (ECCV), 2024

Yuetian Weng

Mingfei Han

Haoyu He

Xiaojun Chang

Bohan Zhuang

VLM

371

126

04 Apr 2024

VidLA: Video-Language Alignment at ScaleComputer Vision and Pattern Recognition (CVPR), 2024

Mamshad Nayeem Rizve

Fan Fei

Jayakrishnan Unnikrishnan

Mubarak Shah

224

21 Mar 2024

VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding

Xiaojian Ma

Yuntao Du

302

148

18 Mar 2024

Video Editing for Video Retrieval

Dima Damen

203

04 Feb 2024

Multi-granularity Correspondence Learning from Long-term Noisy Videos

341

30 Jan 2024

Multi-modal News Understanding with Professionally Labelled Videos (ReutersViLNews)

...

233

23 Jan 2024

Distilling Vision-Language Models on Millions of VideosComputer Vision and Pattern Recognition (CVPR), 2024

...

279

11 Jan 2024

Detours for Navigating Instructional VideosComputer Vision and Pattern Recognition (CVPR), 2024

482

03 Jan 2024

Retrieval-Augmented Egocentric Video CaptioningComputer Vision and Pattern Recognition (CVPR), 2024

Jilan Xu

Yifei Huang

Junlin Hou

Rui Feng

409

01 Jan 2024

CaptainCook4D: A dataset for understanding errors in procedural activities

Rohith Peddi

Shivvrat Arya

B. Challa

Likhitha Pallapothula

Akshay Vyas

...

Vasundhara Komaragiri

269

22 Dec 2023

A Strong Baseline for Temporal Video-Text Alignment

268

21 Dec 2023

Text-Conditioned Resampler For Long Form Video Understanding

305

19 Dec 2023

Learning Object State Changes in Videos: An Open-World Perspective

341

19 Dec 2023

Collaborative Weakly Supervised Video Correlation Learning for Procedure-Aware Instructional Video Analysis

305

18 Dec 2023

GenHowTo: Learning to Generate Actions and State Transformations from Instructional VideosComputer Vision and Pattern Recognition (CVPR), 2023

Dima Damen

257

12 Dec 2023

LvBench: A Benchmark for Long-form Video Understanding with Versatile Multi-modal Question Answering

329

08 Dec 2023

Efficient Pre-training for Localized Instruction Generation of Videos

374

27 Nov 2023

Mug-STAN: Adapting Image-Language Pretrained Models for General Video Understanding

267

25 Nov 2023

HowToCaption: Prompting LLMs to Transform Video Annotations at ScaleEuropean Conference on Computer Vision (ECCV), 2023

Nina Shvetsova

Anna Kukleva

Xudong Hong

Christian Rupprecht

Bernt Schiele

Hilde Kuehne

297

07 Oct 2023

VidChapters-7M: Video Chapters at ScaleNeural Information Processing Systems (NeurIPS), 2023

246

25 Sep 2023

Spatial-Temporal Knowledge-Embedded Transformer for Video Scene Graph GenerationIEEE Transactions on Image Processing (IEEE TIP), 2023

Hefeng Wu

309

23 Sep 2023

In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video RetrievalIEEE International Conference on Computer Vision (ICCV), 2023

Bernt Schiele

229

16 Sep 2023