Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2211.05588
Cited By
Watching the News: Towards VideoQA Models that can Read
10 November 2022
Soumya Jahagirdar
Minesh Mathew
Dimosthenis Karatzas
C. V. Jawahar
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Watching the News: Towards VideoQA Models that can Read"
14 / 14 papers shown
Title
An LMM for Efficient Video Understanding via Reinforced Compression of Video Cubes
Ji Qi
Y. Yao
Yushi Bai
Bin Xu
Juanzi Li
Zhiyuan Liu
Tat-Seng Chua
29
0
0
21 Apr 2025
LiveVQA: Live Visual Knowledge Seeking
Mingyang Fu
Yuyang Peng
Benlin Liu
Yao Wan
D. Z. Chen
28
0
0
07 Apr 2025
Scene-Text Grounding for Text-Based Video Question Answering
Sheng Zhou
Junbin Xiao
Xun Yang
Peipei Song
Dan Guo
Angela Yao
Meng Wang
Tat-Seng Chua
104
1
0
22 Sep 2024
QTG-VQA: Question-Type-Guided Architectural for VideoQA Systems
Zhixian He
Pengcheng Zhao
Fuwei Zhang
Shujin Lin
36
0
0
14 Sep 2024
mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding
Anwen Hu
Haiyang Xu
Liang Zhang
Jiabo Ye
Ming Yan
Ji Zhang
Qin Jin
Fei Huang
Jingren Zhou
VLM
30
27
0
05 Sep 2024
A Survey of Video Datasets for Grounded Event Understanding
Kate Sanders
Benjamin Van Durme
32
4
0
14 Jun 2024
Video Question Answering for People with Visual Impairments Using an Egocentric 360-Degree Camera
Inpyo Song
Minjun Joo
Joonhyung Kwon
Jangwon Lee
EgoV
41
0
0
30 May 2024
Understanding Video Scenes through Text: Insights from Text-based Video Question Answering
Soumya Jahagirdar
Minesh Mathew
Dimosthenis Karatzas
C. V. Jawahar
17
1
0
04 Sep 2023
Making the V in Text-VQA Matter
Shamanthak Hegde
Soumya Jahagirdar
Shankar Gangisetty
CoGe
29
4
0
01 Aug 2023
Reading Between the Lanes: Text VideoQA on the Road
George Tom
Minesh Mathew
Sergi Garcia
Dimosthenis Karatzas
C. V. Jawahar
17
6
0
08 Jul 2023
MultiVENT: Multilingual Videos of Events with Aligned Natural Text
Kate Sanders
David Etter
Reno Kriz
Benjamin Van Durme
VGen
26
7
0
06 Jul 2023
Dissecting Multimodality in VideoQA Transformer Models by Impairing Modality Fusion
Isha Rawal
Alexander Matyasko
Shantanu Jaiswal
Basura Fernando
Cheston Tan
16
1
0
15 Jun 2023
Weakly Supervised Visual Question Answer Generation
Charani Alampalle
Shamanthak Hegde
Soumya Jahagirdar
Shankar Gangisetty
9
0
0
11 Jun 2023
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
390
4,124
0
28 Jan 2022
1