Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.18406
Cited By
An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM
27 March 2024
Wonkyun Kim
Changin Choi
Wonseok Lee
Wonjong Rhee
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM"
16 / 16 papers shown
Title
VideoMultiAgents: A Multi-Agent Framework for Video Question Answering
Noriyuki Kugo
Xiang Li
Z. Li
Ashish Gupta
Arpandeep Khatua
...
Yuta Kyuragi
Yasunori Ishii
Masamoto Tanabiki
Kazuki Kozuka
Ehsan Adeli
49
0
0
25 Apr 2025
Magma: A Foundation Model for Multimodal AI Agents
Jianwei Yang
Reuben Tan
Qianhui Wu
Ruijie Zheng
Baolin Peng
...
Seonghyeon Ye
Joel Jang
Yuquan Deng
Lars Liden
Jianfeng Gao
VLM
AI4TS
86
8
0
18 Feb 2025
TimeLogic: A Temporal Logic Benchmark for Video QA
S. Swetha
Hilde Kuehne
Mubarak Shah
35
1
0
13 Jan 2025
Prompting Large Language Models with Rationale Heuristics for Knowledge-based Visual Question Answering
Zhongjian Hu
Peng Yang
Bing Li
Fengyuan Liu
LRM
97
56
0
22 Dec 2024
Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding
Yiming Zhang
Zhuokai Zhao
Zhaorun Chen
Zenghui Ding
Xianjun Yang
Yining Sun
74
1
0
21 Nov 2024
Question-Answering Dense Video Events
Hangyu Qin
Junbin Xiao
Angela Yao
VLM
65
1
0
06 Sep 2024
Too Many Frames, Not All Useful: Efficient Strategies for Long-Form Video QA
Jongwoo Park
Kanchana Ranasinghe
Kumara Kahatapitiya
Wonjeong Ryoo
Donghyun Kim
Michael S. Ryoo
33
20
0
13 Jun 2024
A Simple LLM Framework for Long-Range Video Question-Answering
Ce Zhang
Taixi Lu
Md. Mohaiminul Islam
Ziyang Wang
Shoubin Yu
Mohit Bansal
Gedas Bertasius
90
27
0
28 Dec 2023
CogAgent: A Visual Language Model for GUI Agents
Wenyi Hong
Weihan Wang
Qingsong Lv
Jiazheng Xu
Wenmeng Yu
...
Juanzi Li
Bin Xu
Yuxiao Dong
Ming Ding
Jie Tang
MLLM
129
310
0
14 Dec 2023
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Bin Lin
Yang Ye
Bin Zhu
Jiaxi Cui
Munan Ning
Peng Jin
Li-ming Yuan
VLM
MLLM
182
576
0
16 Nov 2023
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Qinghao Ye
Haiyang Xu
Guohai Xu
Jiabo Ye
Ming Yan
...
Junfeng Tian
Qiang Qi
Ji Zhang
Feiyan Huang
Jingren Zhou
VLM
MLLM
198
883
0
27 Apr 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
291
2,712
0
24 May 2022
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLM
BDL
LRM
AI4CE
282
3,163
0
21 Mar 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
378
4,010
0
28 Jan 2022
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
212
682
0
13 Oct 2021
1