Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.17435
Cited By
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning
29 November 2023
Chaoyi Zhang
K. Lin
Zhengyuan Yang
Jianfeng Wang
Linjie Li
Chung-Ching Lin
Zicheng Liu
Lijuan Wang
VGen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning"
29 / 29 papers shown
Title
Integrating Video and Text: A Balanced Approach to Multimodal Summary Generation and Evaluation
Galann Pennec
Zhengyuan Liu
Nicholas Asher
Philippe Muller
Nancy F. Chen
VGen
9
0
0
10 May 2025
FocusedAD: Character-centric Movie Audio Description
Xiaojun Ye
C. Wang
Yiren Song
Sheng Zhou
Liangcheng Li
Jiajun Bu
VGen
40
0
0
16 Apr 2025
VideoAgent2: Enhancing the LLM-Based Agent System for Long-Form Video Understanding by Uncertainty-Aware CoT
Zhuo Zhi
Qiangqiang Wu
Minghe shen
W. J. Li
Yinchuan Li
Kun Shao
Kaiwen Zhou
LLMAG
28
0
0
06 Apr 2025
Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation
Junyu Xie
Tengda Han
Max Bain
Arsha Nagrani
Eshika Khandelwal
Gül Varol
Weidi Xie
Andrew Zisserman
DiffM
VGen
55
0
0
01 Apr 2025
Learning to Generate Long-term Future Narrations Describing Activities of Daily Living
Ramanathan Rajendiran
Debaditya Roy
Basura Fernando
VGen
36
0
0
03 Mar 2025
Parameter-free Video Segmentation for Vision and Language Understanding
Louis Mahon
Mirella Lapata
VLM
32
1
0
03 Mar 2025
Memory Helps, but Confabulation Misleads: Understanding Streaming Events in Videos with MLLMs
Gengyuan Zhang
Mingcong Ding
Tong Liu
Yao Zhang
Volker Tresp
71
1
0
24 Feb 2025
Natural Language Generation from Visual Sequences: Challenges and Future Directions
Aditya K Surikuchi
Raquel Fernández
Sandro Pezzelle
EGVM
88
0
0
18 Feb 2025
A Review of Multimodal Explainable Artificial Intelligence: Past, Present and Future
Shilin Sun
Wenbin An
Feng Tian
Fang Nan
Qidong Liu
J. Liu
N. Shah
Ping Chen
72
2
0
18 Dec 2024
SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization
Hongrui Jia
Chaoya Jiang
Haiyang Xu
Wei Ye
Mengfan Dong
Ming Yan
Ji Zhang
Fei Huang
Shikun Zhang
MLLM
83
2
0
17 Nov 2024
StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification
Yichen He
Yuan Lin
Jianchao Wu
Hanchong Zhang
Yuchen Zhang
Ruicheng Le
VGen
VLM
41
2
0
11 Nov 2024
Benchmarking Large Language Models for Conversational Question Answering in Multi-instructional Documents
Shiwei Wu
Chen Zhang
Yan Gao
Qimeng Wang
Tong Bill Xu
Yao Hu
Enhong Chen
ELM
22
1
0
01 Oct 2024
StimuVAR: Spatiotemporal Stimuli-aware Video Affective Reasoning with Multimodal Large Language Models
Y. Guo
Faizan Siddiqui
Yang Zhao
Rama Chellappa
Shao-Yuan Lo
LRM
22
2
0
31 Aug 2024
What Makes a Good Story and How Can We Measure It? A Comprehensive Survey of Story Evaluation
Dingyi Yang
Qin Jin
26
5
0
26 Aug 2024
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities
Weihao Yu
Zhengyuan Yang
Linfeng Ren
Linjie Li
Jianfeng Wang
K. Lin
Chung-Ching Lin
Zicheng Liu
Lijuan Wang
Xinchao Wang
VLM
MLLM
25
17
0
01 Aug 2024
Learning Video Context as Interleaved Multimodal Sequences
S. Shao
Pengchuan Zhang
Y. Li
Xide Xia
A. Meso
Ziteng Gao
Jinheng Xie
N. Holliman
Mike Zheng Shou
38
5
0
31 Jul 2024
AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description
Junyu Xie
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
25
7
0
22 Jul 2024
AutoAD III: The Prequel -- Back to the Pixels
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
DiffM
33
4
0
22 Apr 2024
Movie101v2: Improved Movie Narration Benchmark
Zihao Yue
Yepeng Zhang
Ziheng Wang
Qin Jin
VGen
14
1
0
20 Apr 2024
Contextual AD Narration with Interleaved Multimodal Sequence
Hanlin Wang
Zhan Tong
Kecheng Zheng
Yujun Shen
Limin Wang
VGen
43
3
0
19 Mar 2024
"It's Kind of Context Dependent": Understanding Blind and Low Vision People's Video Accessibility Preferences Across Viewing Scenarios
Lucy Jiang
Crescentia Jung
Mahika Phutane
Abigale Stangl
Shiri Azenkot
33
2
0
16 Mar 2024
Video Understanding with Large Language Models: A Survey
Yunlong Tang
Jing Bi
Siting Xu
Luchuan Song
Susan Liang
...
Feng Zheng
Jianguo Zhang
Ping Luo
Jiebo Luo
Chenliang Xu
VLM
47
76
0
29 Dec 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
Text-Only Training for Image Captioning using Noise-Injected CLIP
David Nukrai
Ron Mokady
Amir Globerson
VLM
CLIP
41
69
0
01 Nov 2022
Complexity-Based Prompting for Multi-Step Reasoning
Yao Fu
Hao-Chun Peng
Ashish Sabharwal
Peter Clark
Tushar Khot
ReLM
LRM
152
298
0
03 Oct 2022
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
237
453
0
24 Sep 2022
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
A. Kalyan
ELM
ReLM
LRM
198
1,089
0
20 Sep 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
What Makes Good In-Context Examples for GPT-
3
3
3
?
Jiachang Liu
Dinghan Shen
Yizhe Zhang
Bill Dolan
Lawrence Carin
Weizhu Chen
AAML
RALM
275
1,296
0
17 Jan 2021
1