Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.03051
Cited By
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
4 October 2024
Wenhao Chai
Enxin Song
Y. Du
Chenlin Meng
Vashisht Madhavan
Omer Bar-Tal
Jeng-Neng Hwang
Saining Xie
Christopher D. Manning
3DV
Re-assign community
ArXiv
PDF
HTML
Papers citing
"AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark"
14 / 14 papers shown
Title
TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action
Jen-Hao Cheng
Vivian Wang
Huayu Wang
Huapeng Zhou
Yi-Hao Peng
...
Wenhao Chai
Yi-Ling Chen
Vibhav Vineet
Qin Cai
Jenq-Neng Hwang
AI4TS
24
0
0
02 May 2025
MR. Video: "MapReduce" is the Principle for Long Video Understanding
Ziqi Pang
Yu-xiong Wang
VLM
25
0
0
22 Apr 2025
Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark
Enxin Song
Wenhao Chai
Weili Xu
Jianwen Xie
Yuxuan Liu
Gaoang Wang
51
0
0
20 Apr 2025
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya
Po-Yao (Bernie) Huang
Peize Sun
Jang Hyun Cho
Andrea Madotto
...
Shiyu Dong
Nikhila Ravi
Daniel Li
Piotr Dollár
Christoph Feichtenhofer
ObjD
VOS
98
0
0
17 Apr 2025
JarvisIR: Elevating Autonomous Driving Perception with Intelligent Image Restoration
Yunlong Lin
Zixu Lin
Haoyu Chen
Panwang Pan
C. Li
Sixiang Chen
Yeying Jin
W. J. Li
Xinghao Ding
25
1
0
05 Apr 2025
Slow-Fast Architecture for Video Multi-Modal Large Language Models
Min Shi
Shihao Wang
Chieh-Yun Chen
Jitesh Jain
Kai Wang
Junjun Xiong
Guilin Liu
Zhiding Yu
Humphrey Shi
28
1
0
02 Apr 2025
CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era
Kanzhi Cheng
Wenpo Song
Jiaxin Fan
Zheng Ma
Qiushi Sun
Fangzhi Xu
Chenyang Yan
Nuo Chen
Jianbing Zhang
Jiajun Chen
MLLM
VLM
40
1
0
16 Mar 2025
Keyframe-oriented Vision Token Pruning: Enhancing Efficiency of Large Vision Language Models on Long-Form Video Processing
Yudong Liu
Jingwei Sun
Yueqian Lin
Jingyang Zhang
Ming Yin
Qinsi Wang
J. Zhang
H. Li
Y. Chen
VLM
54
2
0
13 Mar 2025
Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption
Luozheng Qin
Zhiyu Tan
Mengping Yang
Xiaomeng Yang
Hao Li
76
0
0
12 Mar 2025
DiffPO: Diffusion-styled Preference Optimization for Efficient Inference-Time Alignment of Large Language Models
Ruizhe Chen
Wenhao Chai
Zhifei Yang
Xiaotian Zhang
Joey Tianyi Zhou
Tony Q. S. Quek
Soujanya Poria
Zuozhu Liu
43
0
0
06 Mar 2025
What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness
Zhihang Liu
Chen-Wei Xie
Bin Wen
Feiwu Yu
Jixuan Chen
...
Pandeng Li
Yun Zheng
Hongtao Xie
Yun Zheng
Hongtao Xie
VLM
CoGe
90
0
0
19 Feb 2025
Progress-Aware Video Frame Captioning
Zihui Xue
Joungbin An
Xitong Yang
Kristen Grauman
79
1
0
03 Dec 2024
Efficient Multi-modal Large Language Models via Visual Token Grouping
Minbin Huang
Runhui Huang
Han Shi
Yimeng Chen
Chuanyang Zheng
Xiangguo Sun
Xin Jiang
Z. Li
Hong Cheng
VLM
67
2
0
26 Nov 2024
SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory
Cheng-Yen Yang
Hsiang-Wei Huang
Wenhao Chai
Zhongyu Jiang
Jenq-Neng Hwang
VLM
71
12
0
18 Nov 2024
1