ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.01720
  4. Cited By
Revisiting the "Video" in Video-Language Understanding

Revisiting the "Video" in Video-Language Understanding

3 June 2022
S. Buch
Cristobal Eyzaguirre
Adrien Gaidon
Jiajun Wu
L. Fei-Fei
Juan Carlos Niebles
ArXivPDFHTML

Papers citing "Revisiting the "Video" in Video-Language Understanding"

50 / 122 papers shown
Title
Weakly Supervised Temporal Sentence Grounding via Positive Sample Mining
Weakly Supervised Temporal Sentence Grounding via Positive Sample Mining
Lu Dong
H. Zhang
Hongjie Zhang
Y. Huang
Z. Ling
Yu Qiao
Limin Wang
Y. Wang
AI4TS
24
0
0
10 May 2025
VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations for Synthetic Videos
VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations for Synthetic Videos
Zongxia Li
Xiyang Wu
Yubin Qin
Guangyao Shi
Hongyang Du
Dinesh Manocha
Tianyi Zhou
Jordan Boyd-Graber
MLLM
46
0
0
02 May 2025
Causality-Driven Neural Network Repair: Challenges and Opportunities
Causality-Driven Neural Network Repair: Challenges and Opportunities
Fatemeh Vares
Brittany Johnson
AAML
43
0
0
24 Apr 2025
SVLTA: Benchmarking Vision-Language Temporal Alignment via Synthetic Video Situation
SVLTA: Benchmarking Vision-Language Temporal Alignment via Synthetic Video Situation
Hao Du
Bo Wu
Yan Lu
Zhendong Mao
22
0
0
08 Apr 2025
Video Flow as Time Series: Discovering Temporal Consistency and Variability for VideoQA
Video Flow as Time Series: Discovering Temporal Consistency and Variability for VideoQA
Zijie Song
Zhenzhen Hu
Yixiao Ma
Jia Li
Richang Hong
16
0
0
08 Apr 2025
VideoAgent2: Enhancing the LLM-Based Agent System for Long-Form Video Understanding by Uncertainty-Aware CoT
VideoAgent2: Enhancing the LLM-Based Agent System for Long-Form Video Understanding by Uncertainty-Aware CoT
Zhuo Zhi
Qiangqiang Wu
Minghe shen
W. J. Li
Yinchuan Li
Kun Shao
Kaiwen Zhou
LLMAG
33
0
0
06 Apr 2025
VEGAS: Towards Visually Explainable and Grounded Artificial Social Intelligence
VEGAS: Towards Visually Explainable and Grounded Artificial Social Intelligence
Hao Li
Hao Fei
Zechao Hu
Zhengwei Yang
Zheng Wang
45
0
0
03 Apr 2025
Leveraging Static Relationships for Intra-Type and Inter-Type Message Passing in Video Question Answering
Leveraging Static Relationships for Intra-Type and Inter-Type Message Passing in Video Question Answering
Lili Liang
Guanglu Sun
46
0
0
03 Apr 2025
Can Text-to-Video Generation help Video-Language Alignment?
Can Text-to-Video Generation help Video-Language Alignment?
Luca Zanella
Massimiliano Mancini
Willi Menapace
Sergey Tulyakov
Yiming Wang
Elisa Ricci
DiffM
VGen
55
0
0
24 Mar 2025
Unbiasing through Textual Descriptions: Mitigating Representation Bias in Video Benchmarks
Unbiasing through Textual Descriptions: Mitigating Representation Bias in Video Benchmarks
Nina Shvetsova
Arsha Nagrani
Bernt Schiele
Hilde Kuehne
Christian Rupprecht
42
0
0
24 Mar 2025
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding
Shehreen Azad
Vibhav Vineet
Y. S. Rawat
VLM
93
1
0
11 Mar 2025
Towards Fine-Grained Video Question Answering
Wei Dai
Alan Luo
Zane Durante
Debadutta Dash
Arnold Milstein
Kevin Schulman
Ehsan Adeli
L. Fei-Fei
58
1
0
10 Mar 2025
ToFu: Visual Tokens Reduction via Fusion for Multi-modal, Multi-patch, Multi-image Task
Vittorio Pippi
Matthieu Guillaumin
S. Cascianelli
Rita Cucchiara
M. Jaritz
Loris Bazzani
62
0
0
06 Mar 2025
OpenFly: A Versatile Toolchain and Large-scale Benchmark for Aerial Vision-Language Navigation
OpenFly: A Versatile Toolchain and Large-scale Benchmark for Aerial Vision-Language Navigation
Yunpeng Gao
C. Li
Zhongrui You
J. Liu
Zhen Li
...
Yan Ding
Dong Wang
Z. Wang
Bin Zhao
X. Li
39
4
0
25 Feb 2025
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
Pinelopi Papalampidi
Skanda Koppula
Shreya Pathak
Justin T Chiu
Joseph Heyward
Viorica Patraucean
Jiajun Shen
Antoine Miech
Andrew Zisserman
Aida Nematzdeh
VLM
58
24
0
31 Dec 2024
Hierarchical Banzhaf Interaction for General Video-Language Representation Learning
Hierarchical Banzhaf Interaction for General Video-Language Representation Learning
Peng Jin
H. Li
Li Yuan
Shuicheng Yan
Jie Chen
45
1
0
31 Dec 2024
GFG -- Gender-Fair Generation: A CALAMITA Challenge
GFG -- Gender-Fair Generation: A CALAMITA Challenge
Simona Frenda
Andrea Piergentili
Beatrice Savoldi
Marco Madeddu
Martina Rosola
Silvia Casola
Chiara Ferrando
V. Patti
Matteo Negri
L. Bentivogli
35
2
0
31 Dec 2024
When SAM2 Meets Video Shadow and Mirror Detection
When SAM2 Meets Video Shadow and Mirror Detection
Leiping Jie
VLM
35
1
0
26 Dec 2024
Foundation Models and Adaptive Feature Selection: A Synergistic Approach
  to Video Question Answering
Foundation Models and Adaptive Feature Selection: A Synergistic Approach to Video Question Answering
Sai Bhargav Rongali
M. Cui
Ankit Jha
Neha Bhargava
Saurabh Prasad
Biplab Banerjee
77
0
0
12 Dec 2024
GEXIA: Granularity Expansion and Iterative Approximation for Scalable
  Multi-grained Video-language Learning
GEXIA: Granularity Expansion and Iterative Approximation for Scalable Multi-grained Video-language Learning
Y. Wang
Zhikang Zhang
Jue Wang
D. Fan
Zhenlin Xu
Linda Liu
Xiang Hao
Vimal Bhat
Xinyu Li
VLM
69
1
0
10 Dec 2024
Streaming Detection of Queried Event Start
Streaming Detection of Queried Event Start
Cristobal Eyzaguirre
Eric Tang
S. Buch
Adrien Gaidon
Jiajun Wu
Juan Carlos Niebles
69
0
0
04 Dec 2024
Video LLMs for Temporal Reasoning in Long Videos
Video LLMs for Temporal Reasoning in Long Videos
Fawad Javed Fateh
Umer Ahmed
Hamza Khan
M. Zia
Quoc-Huy Tran
VLM
81
0
0
04 Dec 2024
Progress-Aware Video Frame Captioning
Progress-Aware Video Frame Captioning
Zihui Xue
Joungbin An
Xitong Yang
Kristen Grauman
100
1
0
03 Dec 2024
Learning to Reason Iteratively and Parallelly for Complex Visual
  Reasoning Scenarios
Learning to Reason Iteratively and Parallelly for Complex Visual Reasoning Scenarios
Shantanu Jaiswal
Debaditya Roy
Basura Fernando
Cheston Tan
ReLM
LRM
66
2
0
20 Nov 2024
Principles of Visual Tokens for Efficient Video Understanding
Principles of Visual Tokens for Efficient Video Understanding
Xinyue Hao
Gen Li
Shreyank N. Gowda
Robert B Fisher
Jonathan Huang
Anurag Arnab
Laura Sevilla-Lara
92
0
0
20 Nov 2024
HourVideo: 1-Hour Video-Language Understanding
HourVideo: 1-Hour Video-Language Understanding
Keshigeyan Chandrasegaran
Agrim Gupta
Lea M. Hadzic
Taran Kota
Jimming He
Cristobal Eyzaguirre
Zane Durante
Manling Li
Jiajun Wu
L. Fei-Fei
VLM
39
31
0
07 Nov 2024
Beyond Coarse-Grained Matching in Video-Text Retrieval
Beyond Coarse-Grained Matching in Video-Text Retrieval
Aozhu Chen
Hazel Doughty
Xirong Li
Cees G. M. Snoek
21
0
0
16 Oct 2024
LocoMotion: Learning Motion-Focused Video-Language Representations
LocoMotion: Learning Motion-Focused Video-Language Representations
Hazel Doughty
Fida Mohammad Thoker
Cees G. M. Snoek
33
2
0
15 Oct 2024
Depth Any Video with Scalable Synthetic Data
Depth Any Video with Scalable Synthetic Data
Honghui Yang
Di Huang
Wei Yin
Chunhua Shen
Haifeng Liu
Xiaofei He
Binbin Lin
Wanli Ouyang
Tong He
VGen
MDE
21
16
0
14 Oct 2024
Prompting Video-Language Foundation Models with Domain-specific
  Fine-grained Heuristics for Video Question Answering
Prompting Video-Language Foundation Models with Domain-specific Fine-grained Heuristics for Video Question Answering
Ting Yu
Kunhao Fu
Shuhui Wang
Qingming Huang
Jun Yu
41
0
0
12 Oct 2024
Multi-granularity Contrastive Cross-modal Collaborative Generation for
  End-to-End Long-term Video Question Answering
Multi-granularity Contrastive Cross-modal Collaborative Generation for End-to-End Long-term Video Question Answering
Ting Yu
Kunhao Fu
Jian Zhang
Qingming Huang
Jun Yu
25
2
0
12 Oct 2024
Enhancing Temporal Modeling of Video LLMs via Time Gating
Enhancing Temporal Modeling of Video LLMs via Time Gating
Zi-Yuan Hu
Yiwu Zhong
Shijia Huang
M. Lyu
Liwei Wang
VLM
26
0
0
08 Oct 2024
Uncertainty-Guided Enhancement on Driving Perception System via
  Foundation Models
Uncertainty-Guided Enhancement on Driving Perception System via Foundation Models
Yunhao Yang
Yuxin Hu
Mao Ye
Zaiwei Zhang
Zhichao Lu
Yi Xu
Ufuk Topcu
Ben Snyder
26
2
0
02 Oct 2024
Uncertainty-Guided Self-Questioning and Answering for Video-Language Alignment
Uncertainty-Guided Self-Questioning and Answering for Video-Language Alignment
Jin Chen
Kaijing Ma
Haojian Huang
Jiayu Shen
Han Fang
Xianghao Zang
Chao Ban
79
2
0
17 Sep 2024
From Experts to the Public: Governing Multimodal Language Models in
  Politically Sensitive Video Analysis
From Experts to the Public: Governing Multimodal Language Models in Politically Sensitive Video Analysis
Tanusree Sharma
Yujin Potter
Zachary Kilhoffer
Yun Huang
Dawn Song
Yang Wang
51
3
0
15 Sep 2024
Towards Completeness: A Generalizable Action Proposal Generator for
  Zero-Shot Temporal Action Localization
Towards Completeness: A Generalizable Action Proposal Generator for Zero-Shot Temporal Action Localization
Jia-Run Du
Kun-Yu Lin
Jingke Meng
Wei-Shi Zheng
26
0
0
25 Aug 2024
VideoQA in the Era of LLMs: An Empirical Study
VideoQA in the Era of LLMs: An Empirical Study
Junbin Xiao
Nanxin Huang
Hangyu Qin
Dongyang Li
Yicong Li
...
Zhulin Tao
Jianxing Yu
Liang Lin
Tat-Seng Chua
Angela Yao
23
10
0
08 Aug 2024
Causal Understanding For Video Question Answering
Causal Understanding For Video Question Answering
Bhanu Prakash Reddy Guda
Tanmay Kulkarni
Adithya Sampath
Swarnashree Mysore Sathyendra
CML
39
0
0
23 Jul 2024
Meta-optimized Angular Margin Contrastive Framework for Video-Language
  Representation Learning
Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning
Thong Nguyen
Yi Bin
Xiaobao Wu
Xinshuai Dong
Zhiyuan Hu
Khoi M. Le
Cong-Duy Nguyen
See-Kiong Ng
Luu Anh Tuan
37
5
0
04 Jul 2024
HEMM: Holistic Evaluation of Multimodal Foundation Models
HEMM: Holistic Evaluation of Multimodal Foundation Models
Paul Pu Liang
Akshay Goindani
Talha Chafekar
Leena Mathur
Haofei Yu
Ruslan Salakhutdinov
Louis-Philippe Morency
36
10
0
03 Jul 2024
Burst Image Super-Resolution with Base Frame Selection
Burst Image Super-Resolution with Base Frame Selection
Sanghyun Kim
Min Jung Lee
Woohyeok Kim
Deunsol Jung
Jaesung Rim
Sunghyun Cho
Minsu Cho
SupR
23
1
0
25 Jun 2024
Encoding and Controlling Global Semantics for Long-form Video Question
  Answering
Encoding and Controlling Global Semantics for Long-form Video Question Answering
Thong Nguyen
Zhiyuan Hu
Xiaobao Wu
Cong-Duy Nguyen
See-Kiong Ng
A. Luu
35
2
0
30 May 2024
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
Ziyang Wang
Shoubin Yu
Elias Stengel-Eskin
Jaehong Yoon
Feng Cheng
Gedas Bertasius
Mohit Bansal
40
56
0
29 May 2024
SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World
  Knowledge
SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge
Andong Wang
Bo Wu
Sunli Chen
Zhenfang Chen
Haotian Guan
Wei-Ning Lee
Li Erran Li
Chuang Gan
LRM
RALM
29
16
0
15 May 2024
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
Juhong Min
Shyamal Buch
Arsha Nagrani
Minsu Cho
Cordelia Schmid
LRM
34
20
0
09 Apr 2024
LongVLM: Efficient Long Video Understanding via Large Language Models
LongVLM: Efficient Long Video Understanding via Large Language Models
Yuetian Weng
Mingfei Han
Haoyu He
Xiaojun Chang
Bohan Zhuang
VLM
60
56
0
04 Apr 2024
VideoDistill: Language-aware Vision Distillation for Video Question
  Answering
VideoDistill: Language-aware Vision Distillation for Video Question Answering
Bo Zou
Chao Yang
Yu Qiao
Chengbin Quan
Youjian Zhao
VGen
39
1
0
01 Apr 2024
VideoAgent: Long-form Video Understanding with Large Language Model as
  Agent
VideoAgent: Long-form Video Understanding with Large Language Model as Agent
Xiaohan Wang
Yuhui Zhang
Orr Zohar
Serena Yeung-Levy
VLM
108
83
0
15 Mar 2024
VideoPrism: A Foundational Visual Encoder for Video Understanding
VideoPrism: A Foundational Visual Encoder for Video Understanding
Long Zhao
N. B. Gundavarapu
Liangzhe Yuan
Hao Zhou
Shen Yan
...
Huisheng Wang
Hartwig Adam
Mikhail Sirotenko
Ting Liu
Boqing Gong
VGen
27
29
0
20 Feb 2024
CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
Shoubin Yu
Jaehong Yoon
Mohit Bansal
77
4
0
08 Feb 2024
123
Next