Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.12060
Cited By
VideoXum: Cross-modal Visual and Textural Summarization of Videos
21 March 2023
Jingyang Lin
Hang Hua
Ming Chen
Yikang Li
Jenhao Hsiao
C. Ho
Jiebo Luo
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VideoXum: Cross-modal Visual and Textural Summarization of Videos"
27 / 27 papers shown
Title
Integrating Video and Text: A Balanced Approach to Multimodal Summary Generation and Evaluation
Galann Pennec
Zhengyuan Liu
Nicholas Asher
Philippe Muller
Nancy F. Chen
VGen
9
0
0
10 May 2025
SD-VSum: A Method and Dataset for Script-Driven Video Summarization
Manolis Mylonas
Evlampios Apostolidis
Vasileios Mezaris
20
0
0
06 May 2025
HierSum: A Global and Local Attention Mechanism for Video Summarization
Apoorva Beedu
Irfan Essa
25
0
0
25 Apr 2025
Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting
Yunlong Tang
Jing Bi
Chao Huang
Susan Liang
Daiki Shimada
...
Jinxi He
Liu He
Zeliang Zhang
Jiebo Luo
Chenliang Xu
28
0
0
07 Apr 2025
WikiVideo: Article Generation from Multiple Videos
Alexander Martin
Reno Kriz
William Walden
Kate Sanders
Hannah Recknor
Eugene Yang
Francis Ferraro
Benjamin Van Durme
DiffM
VGen
38
1
0
01 Apr 2025
A Novel Trustworthy Video Summarization Algorithm Through a Mixture of LoRA Experts
Wenzhuo Du
G. Wang
Guancheng Chen
Hang Zhao
X. Li
Jian Gao
65
0
0
08 Mar 2025
FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity
Hang Hua
Qing Liu
Lingzhi Zhang
Jing Shi
Zhifei Zhang
Yilin Wang
Jianming Zhang
Jiebo Luo
CoGe
VLM
87
6
0
23 Nov 2024
MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models
Hang Hua
Yunlong Tang
Ziyun Zeng
Liangliang Cao
Zhengyuan Yang
Hangfeng He
Chenliang Xu
Jiebo Luo
VLM
CoGe
23
9
0
13 Oct 2024
TeaserGen: Generating Teasers for Long Documentaries
Weihan Xu
Paul Pu Liang
Haven Kim
Julian McAuley
Taylor Berg-Kirkpatrick
Hao-Wen Dong
VGen
VLM
DiffM
22
0
0
08 Oct 2024
Grounding Partially-Defined Events in Multimodal Data
Kate Sanders
Reno Kriz
David Etter
Hannah Recknor
Alexander Martin
Cameron Carpenter
Jingyang Lin
Benjamin Van Durme
22
1
0
07 Oct 2024
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
Ye Liu
Zongyang Ma
Zhongang Qi
Yang Wu
Ying Shan
Chang Wen Chen
15
15
0
26 Sep 2024
UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos
Yuting Mei
Linli Yao
Qin Jin
19
1
0
24 Jun 2024
Converging Dimensions: Information Extraction and Summarization through Multisource, Multimodal, and Multilingual Fusion
Pranav Janjani
Mayank Palan
Sarvesh Shirude
Ninad Shegokar
Sunny Kumar
Faruk Kazi
14
0
0
19 Jun 2024
A Systematic Survey of Text Summarization: From Statistical Methods to Large Language Models
Haopeng Zhang
Philip S. Yu
Jiawei Zhang
30
1
0
17 Jun 2024
An Empirical Analysis on Large Language Models in Debate Evaluation
Xinyi Liu
Pinxin Liu
Hangfeng He
ELM
17
4
0
28 May 2024
PromptFix: You Prompt and We Fix the Photo
Yongsheng Yu
Ziyun Zeng
Hang Hua
Jianlong Fu
Jiebo Luo
MLLM
DiffM
VLM
33
3
0
27 May 2024
"Previously on ..." From Recaps to Story Summarization
Aditya Kumar Singh
Dhruv Srivastava
Makarand Tapaswi
30
0
0
19 May 2024
FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction
Hang Hua
Jing Shi
Kushal Kafle
Simon Jenni
Daoan Zhang
John Collomosse
Scott D. Cohen
Jiebo Luo
CoGe
VLM
36
9
0
23 Apr 2024
V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning
Hang Hua
Yunlong Tang
Chenliang Xu
Jiebo Luo
VGen
52
22
0
18 Apr 2024
Scaling Up Video Summarization Pretraining with Large Language Models
Dawit Mureja Argaw
Seunghyun Yoon
Fabian Caba Heilbron
Hanieh Deilamsalehy
Trung Bui
Zhaowen Wang
Franck Dernoncourt
Joon Son Chung
20
9
0
04 Apr 2024
AVicuna: Audio-Visual LLM with Interleaver and Context-Boundary Alignment for Temporal Referential Dialogue
Yunlong Tang
Daiki Shimada
Jing Bi
Chenliang Xu
VGen
14
10
0
24 Mar 2024
A Comprehensive Survey on Process-Oriented Automatic Text Summarization with Exploration of LLM-Based Methods
Hanlei Jin
Yang Zhang
Dan Meng
Jun Wang
Jinghua Tan
57
76
0
05 Mar 2024
Video Understanding with Large Language Models: A Survey
Yunlong Tang
Jing Bi
Siting Xu
Luchuan Song
Susan Liang
...
Feng Zheng
Jianguo Zhang
Ping Luo
Jiebo Luo
Chenliang Xu
VLM
47
76
0
29 Dec 2023
Learning to Evaluate the Artness of AI-generated Images
Junyu Chen
Jie An
Hanjia Lyu
Christopher Kanan
Jiebo Luo
EGVM
11
11
0
08 May 2023
Improving Pre-trained Language Model Fine-tuning with Noise Stability Regularization
Hang Hua
Xingjian Li
Dejing Dou
Chengzhong Xu
Jiebo Luo
13
15
0
12 Jun 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
380
4,010
0
28 Jan 2022
Multi-modal Transformer for Video Retrieval
Valentin Gabeur
Chen Sun
Alahari Karteek
Cordelia Schmid
ViT
396
532
0
21 Jul 2020
1