ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.12060
  4. Cited By
VideoXum: Cross-modal Visual and Textural Summarization of Videos

VideoXum: Cross-modal Visual and Textural Summarization of Videos

21 March 2023
Jingyang Lin
Hang Hua
Ming Chen
Yikang Li
Jenhao Hsiao
C. Ho
Jiebo Luo
ArXivPDFHTML

Papers citing "VideoXum: Cross-modal Visual and Textural Summarization of Videos"

27 / 27 papers shown
Title
Integrating Video and Text: A Balanced Approach to Multimodal Summary Generation and Evaluation
Integrating Video and Text: A Balanced Approach to Multimodal Summary Generation and Evaluation
Galann Pennec
Zhengyuan Liu
Nicholas Asher
Philippe Muller
Nancy F. Chen
VGen
21
0
0
10 May 2025
SD-VSum: A Method and Dataset for Script-Driven Video Summarization
SD-VSum: A Method and Dataset for Script-Driven Video Summarization
Manolis Mylonas
Evlampios Apostolidis
Vasileios Mezaris
24
0
0
06 May 2025
HierSum: A Global and Local Attention Mechanism for Video Summarization
HierSum: A Global and Local Attention Mechanism for Video Summarization
Apoorva Beedu
Irfan Essa
34
0
0
25 Apr 2025
Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting
Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting
Yunlong Tang
Jing Bi
Chao Huang
Susan Liang
Daiki Shimada
...
Jinxi He
Liu He
Zeliang Zhang
Jiebo Luo
Chenliang Xu
31
0
0
07 Apr 2025
WikiVideo: Article Generation from Multiple Videos
WikiVideo: Article Generation from Multiple Videos
Alexander Martin
Reno Kriz
William Walden
Kate Sanders
Hannah Recknor
Eugene Yang
Francis Ferraro
Benjamin Van Durme
DiffM
VGen
42
1
0
01 Apr 2025
A Novel Trustworthy Video Summarization Algorithm Through a Mixture of LoRA Experts
Wenzhuo Du
G. Wang
Guancheng Chen
Hang Zhao
X. Li
Jian Gao
65
0
0
08 Mar 2025
FINECAPTION: Compositional Image Captioning Focusing on Wherever You
  Want at Any Granularity
FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity
Hang Hua
Qing Liu
Lingzhi Zhang
Jing Shi
Zhifei Zhang
Yilin Wang
Jianming Zhang
Jiebo Luo
CoGe
VLM
87
6
0
23 Nov 2024
MMCOMPOSITION: Revisiting the Compositionality of Pre-trained
  Vision-Language Models
MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models
Hang Hua
Yunlong Tang
Ziyun Zeng
Liangliang Cao
Zhengyuan Yang
Hangfeng He
Chenliang Xu
Jiebo Luo
VLM
CoGe
28
9
0
13 Oct 2024
TeaserGen: Generating Teasers for Long Documentaries
TeaserGen: Generating Teasers for Long Documentaries
Weihan Xu
Paul Pu Liang
Haven Kim
Julian McAuley
Taylor Berg-Kirkpatrick
Hao-Wen Dong
VGen
VLM
DiffM
22
0
0
08 Oct 2024
Grounding Partially-Defined Events in Multimodal Data
Grounding Partially-Defined Events in Multimodal Data
Kate Sanders
Reno Kriz
David Etter
Hannah Recknor
Alexander Martin
Cameron Carpenter
Jingyang Lin
Benjamin Van Durme
22
1
0
07 Oct 2024
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
Ye Liu
Zongyang Ma
Zhongang Qi
Yang Wu
Ying Shan
Chang Wen Chen
18
15
0
26 Sep 2024
UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos
UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos
Yuting Mei
Linli Yao
Qin Jin
21
1
0
24 Jun 2024
Converging Dimensions: Information Extraction and Summarization through
  Multisource, Multimodal, and Multilingual Fusion
Converging Dimensions: Information Extraction and Summarization through Multisource, Multimodal, and Multilingual Fusion
Pranav Janjani
Mayank Palan
Sarvesh Shirude
Ninad Shegokar
Sunny Kumar
Faruk Kazi
18
0
0
19 Jun 2024
A Systematic Survey of Text Summarization: From Statistical Methods to
  Large Language Models
A Systematic Survey of Text Summarization: From Statistical Methods to Large Language Models
Haopeng Zhang
Philip S. Yu
Jiawei Zhang
30
1
0
17 Jun 2024
An Empirical Analysis on Large Language Models in Debate Evaluation
An Empirical Analysis on Large Language Models in Debate Evaluation
Xinyi Liu
Pinxin Liu
Hangfeng He
ELM
21
4
0
28 May 2024
PromptFix: You Prompt and We Fix the Photo
PromptFix: You Prompt and We Fix the Photo
Yongsheng Yu
Ziyun Zeng
Hang Hua
Jianlong Fu
Jiebo Luo
MLLM
DiffM
VLM
33
3
0
27 May 2024
"Previously on ..." From Recaps to Story Summarization
"Previously on ..." From Recaps to Story Summarization
Aditya Kumar Singh
Dhruv Srivastava
Makarand Tapaswi
32
0
0
19 May 2024
FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection
  and Correction
FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction
Hang Hua
Jing Shi
Kushal Kafle
Simon Jenni
Daoan Zhang
John Collomosse
Scott D. Cohen
Jiebo Luo
CoGe
VLM
39
9
0
23 Apr 2024
V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt
  Instruction Tuning
V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning
Hang Hua
Yunlong Tang
Chenliang Xu
Jiebo Luo
VGen
54
22
0
18 Apr 2024
Scaling Up Video Summarization Pretraining with Large Language Models
Scaling Up Video Summarization Pretraining with Large Language Models
Dawit Mureja Argaw
Seunghyun Yoon
Fabian Caba Heilbron
Hanieh Deilamsalehy
Trung Bui
Zhaowen Wang
Franck Dernoncourt
Joon Son Chung
24
9
0
04 Apr 2024
AVicuna: Audio-Visual LLM with Interleaver and Context-Boundary
  Alignment for Temporal Referential Dialogue
AVicuna: Audio-Visual LLM with Interleaver and Context-Boundary Alignment for Temporal Referential Dialogue
Yunlong Tang
Daiki Shimada
Jing Bi
Chenliang Xu
VGen
21
17
0
24 Mar 2024
A Comprehensive Survey on Process-Oriented Automatic Text Summarization with Exploration of LLM-Based Methods
A Comprehensive Survey on Process-Oriented Automatic Text Summarization with Exploration of LLM-Based Methods
Hanlei Jin
Yang Zhang
Dan Meng
Jun Wang
Jinghua Tan
57
76
0
05 Mar 2024
Video Understanding with Large Language Models: A Survey
Video Understanding with Large Language Models: A Survey
Yunlong Tang
Jing Bi
Siting Xu
Luchuan Song
Susan Liang
...
Feng Zheng
Jianguo Zhang
Ping Luo
Jiebo Luo
Chenliang Xu
VLM
47
76
0
29 Dec 2023
Learning to Evaluate the Artness of AI-generated Images
Learning to Evaluate the Artness of AI-generated Images
Junyu Chen
Jie An
Hanjia Lyu
Christopher Kanan
Jiebo Luo
EGVM
13
11
0
08 May 2023
Improving Pre-trained Language Model Fine-tuning with Noise Stability
  Regularization
Improving Pre-trained Language Model Fine-tuning with Noise Stability Regularization
Hang Hua
Xingjian Li
Dejing Dou
Chengzhong Xu
Jiebo Luo
13
15
0
12 Jun 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
382
4,010
0
28 Jan 2022
Multi-modal Transformer for Video Retrieval
Multi-modal Transformer for Video Retrieval
Valentin Gabeur
Chen Sun
Alahari Karteek
Cordelia Schmid
ViT
398
532
0
21 Jul 2020
1