ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.05588
  4. Cited By
Watching the News: Towards VideoQA Models that can Read

Watching the News: Towards VideoQA Models that can Read

10 November 2022
Soumya Jahagirdar
Minesh Mathew
Dimosthenis Karatzas
C. V. Jawahar
ArXivPDFHTML

Papers citing "Watching the News: Towards VideoQA Models that can Read"

14 / 14 papers shown
Title
An LMM for Efficient Video Understanding via Reinforced Compression of Video Cubes
An LMM for Efficient Video Understanding via Reinforced Compression of Video Cubes
Ji Qi
Y. Yao
Yushi Bai
Bin Xu
Juanzi Li
Zhiyuan Liu
Tat-Seng Chua
29
0
0
21 Apr 2025
LiveVQA: Live Visual Knowledge Seeking
LiveVQA: Live Visual Knowledge Seeking
Mingyang Fu
Yuyang Peng
Benlin Liu
Yao Wan
D. Z. Chen
28
0
0
07 Apr 2025
Scene-Text Grounding for Text-Based Video Question Answering
Scene-Text Grounding for Text-Based Video Question Answering
Sheng Zhou
Junbin Xiao
Xun Yang
Peipei Song
Dan Guo
Angela Yao
Meng Wang
Tat-Seng Chua
104
1
0
22 Sep 2024
QTG-VQA: Question-Type-Guided Architectural for VideoQA Systems
QTG-VQA: Question-Type-Guided Architectural for VideoQA Systems
Zhixian He
Pengcheng Zhao
Fuwei Zhang
Shujin Lin
36
0
0
14 Sep 2024
mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page
  Document Understanding
mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding
Anwen Hu
Haiyang Xu
Liang Zhang
Jiabo Ye
Ming Yan
Ji Zhang
Qin Jin
Fei Huang
Jingren Zhou
VLM
30
27
0
05 Sep 2024
A Survey of Video Datasets for Grounded Event Understanding
A Survey of Video Datasets for Grounded Event Understanding
Kate Sanders
Benjamin Van Durme
32
4
0
14 Jun 2024
Video Question Answering for People with Visual Impairments Using an
  Egocentric 360-Degree Camera
Video Question Answering for People with Visual Impairments Using an Egocentric 360-Degree Camera
Inpyo Song
Minjun Joo
Joonhyung Kwon
Jangwon Lee
EgoV
41
0
0
30 May 2024
Understanding Video Scenes through Text: Insights from Text-based Video
  Question Answering
Understanding Video Scenes through Text: Insights from Text-based Video Question Answering
Soumya Jahagirdar
Minesh Mathew
Dimosthenis Karatzas
C. V. Jawahar
17
1
0
04 Sep 2023
Making the V in Text-VQA Matter
Making the V in Text-VQA Matter
Shamanthak Hegde
Soumya Jahagirdar
Shankar Gangisetty
CoGe
29
4
0
01 Aug 2023
Reading Between the Lanes: Text VideoQA on the Road
Reading Between the Lanes: Text VideoQA on the Road
George Tom
Minesh Mathew
Sergi Garcia
Dimosthenis Karatzas
C. V. Jawahar
17
6
0
08 Jul 2023
MultiVENT: Multilingual Videos of Events with Aligned Natural Text
MultiVENT: Multilingual Videos of Events with Aligned Natural Text
Kate Sanders
David Etter
Reno Kriz
Benjamin Van Durme
VGen
26
7
0
06 Jul 2023
Dissecting Multimodality in VideoQA Transformer Models by Impairing
  Modality Fusion
Dissecting Multimodality in VideoQA Transformer Models by Impairing Modality Fusion
Isha Rawal
Alexander Matyasko
Shantanu Jaiswal
Basura Fernando
Cheston Tan
16
1
0
15 Jun 2023
Weakly Supervised Visual Question Answer Generation
Weakly Supervised Visual Question Answer Generation
Charani Alampalle
Shamanthak Hegde
Soumya Jahagirdar
Shankar Gangisetty
9
0
0
11 Jun 2023
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
390
4,124
0
28 Jan 2022
1