ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2108.07781
  4. Cited By
End-to-End Dense Video Captioning with Parallel Decoding

End-to-End Dense Video Captioning with Parallel Decoding

17 August 2021
Teng Wang
Ruimao Zhang
Zhichao Lu
Feng Zheng
Ran Cheng
Ping Luo
    3DV
ArXivPDFHTML

Papers citing "End-to-End Dense Video Captioning with Parallel Decoding"

50 / 102 papers shown
Title
VideoExpert: Augmented LLM for Temporal-Sensitive Video Understanding
VideoExpert: Augmented LLM for Temporal-Sensitive Video Understanding
Henghao Zhao
Ge-Peng Ji
Rui Yan
Huan Xiong
Zechao Li
21
0
0
10 Apr 2025
Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting
Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting
Yunlong Tang
Jing Bi
Chao Huang
Susan Liang
Daiki Shimada
...
Jinxi He
Liu He
Zeliang Zhang
Jiebo Luo
Chenliang Xu
34
0
0
07 Apr 2025
Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation
Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation
Junyu Xie
Tengda Han
Max Bain
Arsha Nagrani
Eshika Khandelwal
Gül Varol
Weidi Xie
Andrew Zisserman
DiffM
VGen
55
0
0
01 Apr 2025
Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs
Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs
Lucas Ventura
Antoine Yang
Cordelia Schmid
Gül Varol
28
0
0
31 Mar 2025
Watch and Learn: Leveraging Expert Knowledge and Language for Surgical Video Understanding
David Gastager
Ghazal Ghazaei
Constantin Patsch
53
0
0
14 Mar 2025
Measure Twice, Cut Once: Grasping Video Structures and Event Semantics with LLMs for Video Temporal Localization
Zongshang Pang
Mayu Otani
Yuta Nakashima
51
0
0
12 Mar 2025
Fine-Grained Video Captioning through Scene Graph Consolidation
Fine-Grained Video Captioning through Scene Graph Consolidation
Sanghyeok Chu
Seonguk Seo
Bohyung Han
46
1
0
23 Feb 2025
VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning
VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning
Ji Soo Lee
Jongha Kim
Jeehye Na
Jinyoung Park
H. Kim
VGen
36
0
0
12 Jan 2025
Exploring Temporal Event Cues for Dense Video Captioning in Cyclic
  Co-learning
Exploring Temporal Event Cues for Dense Video Captioning in Cyclic Co-learning
Zhuyang Xie
Yan Yang
Yankai Yu
Jie Wang
Yongquan Jiang
Xiao-Jun Wu
73
0
0
16 Dec 2024
Neptune: The Long Orbit to Benchmarking Long Video Understanding
Arsha Nagrani
Mingda Zhang
Ramin Mehran
Rachel Hornung
N. B. Gundavarapu
...
Boqing Gong
Cordelia Schmid
Mikhail Sirotenko
Yukun Zhu
Tobias Weyand
98
4
0
12 Dec 2024
TimeRefine: Temporal Grounding with Time Refining Video LLM
TimeRefine: Temporal Grounding with Time Refining Video LLM
Xizi Wang
Feng Cheng
Ziyang Wang
Huiyu Wang
Md. Mohaiminul Islam
Lorenzo Torresani
Mohit Bansal
Gedas Bertasius
David J. Crandall
109
1
0
12 Dec 2024
Video LLMs for Temporal Reasoning in Long Videos
Video LLMs for Temporal Reasoning in Long Videos
Fawad Javed Fateh
Umer Ahmed
Hamza Khan
M. Zia
Quoc-Huy Tran
VLM
79
0
0
04 Dec 2024
TechCoach: Towards Technical-Point-Aware Descriptive Action Coaching
TechCoach: Towards Technical-Point-Aware Descriptive Action Coaching
Yuan-Ming Li
An-Lan Wang
Kun-Yu Lin
Yu-Ming Tang
Ling-an Zeng
Jian-Fang Hu
Wei-Shi Zheng
93
6
0
26 Nov 2024
I Can Tell What I am Doing: Toward Real-World Natural Language Grounding
  of Robot Experiences
I Can Tell What I am Doing: Toward Real-World Natural Language Grounding of Robot Experiences
Zihan Wang
Brian Liang
Varad Dhat
Zander Brumbaugh
Nick Walker
Ranjay Krishna
Maya Cakmak
59
4
0
20 Nov 2024
Grounded Video Caption Generation
Grounded Video Caption Generation
Evangelos Kazakos
Cordelia Schmid
Josef Sivic
23
0
0
12 Nov 2024
GEM-VPC: A dual Graph-Enhanced Multimodal integration for Video
  Paragraph Captioning
GEM-VPC: A dual Graph-Enhanced Multimodal integration for Video Paragraph Captioning
Eileen Wang
Caren Han
Josiah Poon
17
0
0
12 Oct 2024
Audio Description Generation in the Era of LLMs and VLMs: A Review of
  Transferable Generative AI Technologies
Audio Description Generation in the Era of LLMs and VLMs: A Review of Transferable Generative AI Technologies
Yingqiang Gao
Lukas Fischer
Alexa Lintner
Sarah Ebling
27
0
0
11 Oct 2024
TRACE: Temporal Grounding Video LLM via Causal Event Modeling
TRACE: Temporal Grounding Video LLM via Causal Event Modeling
Yongxin Guo
Jingyu Liu
Mingda Li
Xiaoying Tang
Qingbin Liu
Xiaoying Tang
20
14
0
08 Oct 2024
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
Ye Liu
Zongyang Ma
Zhongang Qi
Yang Wu
Ying Shan
Chang Wen Chen
31
15
0
26 Sep 2024
MECD: Unlocking Multi-Event Causal Discovery in Video Reasoning
MECD: Unlocking Multi-Event Causal Discovery in Video Reasoning
Tieyuan Chen
Huabin Liu
Tianyao He
Yihang Chen
Chaofan Gan
...
Cheng Zhong
Yang Zhang
Yingxue Wang
Hui Lin
Weiyao Lin
VGen
CML
27
4
0
26 Sep 2024
Interpretable Long-term Action Quality Assessment
Interpretable Long-term Action Quality Assessment
Xu Dong
Xinran Liu
Wanqing Li
Anthony Adeyemi-Ejeye
Andrew Gilbert
ViT
22
1
0
21 Aug 2024
COM Kitchens: An Unedited Overhead-view Video Dataset as a
  Vision-Language Benchmark
COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language Benchmark
Koki Maeda
Tosho Hirasawa
Atsushi Hashimoto
Jun Harashima
Leszek Rybicki
Yusuke Fukasawa
Yoshitaka Ushiku
35
0
0
05 Aug 2024
WTS: A Pedestrian-Centric Traffic Video Dataset for Fine-grained
  Spatial-Temporal Understanding
WTS: A Pedestrian-Centric Traffic Video Dataset for Fine-grained Spatial-Temporal Understanding
Quan Kong
Yuki Kawana
Rajat Saini
Ashutosh Kumar
Jingjing Pan
...
Yohei Ozao
Balázs Opra
D. Anastasiu
Yoichi Sato
N. Kobori
VGen
27
7
0
22 Jul 2024
UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos
UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos
Yuting Mei
Linli Yao
Qin Jin
24
1
0
24 Jun 2024
Live Video Captioning
Live Video Captioning
Eduardo Blanco-Fernández
Carlos Gutiérrez-Álvarez
Nadia Nasri
Saturnino Maldonado-Bascón
Roberto J. López-Sastre
24
0
0
20 Jun 2024
Towards Holistic Language-video Representation: the language
  model-enhanced MSR-Video to Text Dataset
Towards Holistic Language-video Representation: the language model-enhanced MSR-Video to Text Dataset
Yuchen Yang
Yingxuan Duan
VGen
23
0
0
19 Jun 2024
An Effective-Efficient Approach for Dense Multi-Label Action Detection
An Effective-Efficient Approach for Dense Multi-Label Action Detection
Faegheh Sardari
Armin Mustafa
Philip J. B. Jackson
Adrian Hilton
30
0
0
10 Jun 2024
Hierarchical Action Recognition: A Contrastive Video-Language Approach
  with Hierarchical Interactions
Hierarchical Action Recognition: A Contrastive Video-Language Approach with Hierarchical Interactions
Rui Zhang
Shuailong Li
Junxiao Xue
Feng Lin
Qing Zhang
Xiao Ma
Xiaoran Yan
21
0
0
28 May 2024
MICap: A Unified Model for Identity-aware Movie Descriptions
MICap: A Unified Model for Identity-aware Movie Descriptions
Haran Raajesh
Naveen Reddy Desanur
Zeeshan Khan
Makarand Tapaswi
23
4
0
19 May 2024
Decoding Radiologists' Intentions: A Novel System for Accurate Region
  Identification in Chest X-ray Image Analysis
Decoding Radiologists' Intentions: A Novel System for Accurate Region Identification in Chest X-ray Image Analysis
Akash Awasthi
Safwan Ahmad
Bryant Le
Hien Nguyen
16
0
0
29 Apr 2024
AutoAD III: The Prequel -- Back to the Pixels
AutoAD III: The Prequel -- Back to the Pixels
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
DiffM
36
20
0
22 Apr 2024
V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt
  Instruction Tuning
V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning
Hang Hua
Yunlong Tang
Chenliang Xu
Jiebo Luo
VGen
57
22
0
18 Apr 2024
Enhancing Traffic Safety with Parallel Dense Video Captioning for
  End-to-End Event Analysis
Enhancing Traffic Safety with Parallel Dense Video Captioning for End-to-End Event Analysis
Maged Shoman
Dongdong Wang
Armstrong Aboah
Mohamed Abdel-Aty
24
13
0
12 Apr 2024
Do You Remember? Dense Video Captioning with Cross-Modal Memory
  Retrieval
Do You Remember? Dense Video Captioning with Cross-Modal Memory Retrieval
Minkuk Kim
Hyeon Bae Kim
Jinyoung Moon
Jinwoo Choi
Seong Tae Kim
29
16
0
11 Apr 2024
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
Juhong Min
Shyamal Buch
Arsha Nagrani
Minsu Cho
Cordelia Schmid
LRM
34
20
0
09 Apr 2024
Streaming Dense Video Captioning
Streaming Dense Video Captioning
Xingyi Zhou
Anurag Arnab
Shyamal Buch
Shen Yan
Austin Myers
Xuehan Xiong
Arsha Nagrani
Cordelia Schmid
VLM
26
30
0
01 Apr 2024
Towards Multimodal Video Paragraph Captioning Models Robust to Missing
  Modality
Towards Multimodal Video Paragraph Captioning Models Robust to Missing Modality
Sishuo Chen
Lei Li
Shuhuai Ren
Rundong Gao
Yuanxin Liu
Xiaohan Bi
Xu Sun
Lu Hou
27
3
0
28 Mar 2024
OmniVid: A Generative Framework for Universal Video Understanding
OmniVid: A Generative Framework for Universal Video Understanding
Junke Wang
Dongdong Chen
Chong Luo
Bo He
Lu Yuan
Zuxuan Wu
Yu-Gang Jiang
VLM
VGen
69
14
0
26 Mar 2024
AVicuna: Audio-Visual LLM with Interleaver and Context-Boundary
  Alignment for Temporal Referential Dialogue
AVicuna: Audio-Visual LLM with Interleaver and Context-Boundary Alignment for Temporal Referential Dialogue
Yunlong Tang
Daiki Shimada
Jing Bi
Chenliang Xu
VGen
24
17
0
24 Mar 2024
Video Mamba Suite: State Space Model as a Versatile Alternative for
  Video Understanding
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
Guo Chen
Yifei Huang
Jilan Xu
Baoqi Pei
Zhe Chen
Zhiqi Li
Jiahao Wang
Kunchang Li
Tong Lu
Limin Wang
Mamba
64
68
0
14 Mar 2024
TutoAI: A Cross-domain Framework for AI-assisted Mixed-media Tutorial
  Creation on Physical Tasks
TutoAI: A Cross-domain Framework for AI-assisted Mixed-media Tutorial Creation on Physical Tasks
Yuexi Chen
Vlad I. Morariu
Anh Truong
Zhicheng Liu
DiffM
VGen
16
3
0
12 Mar 2024
A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing
  Objects in 3D Scenes
A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D Scenes
Ting Yu
Xiaojun Lin
Shuhui Wang
Weiguo Sheng
Qingming Huang
Jun-chen Yu
3DV
30
10
0
12 Mar 2024
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Tsai-Shien Chen
Aliaksandr Siarohin
Willi Menapace
Ekaterina Deyneka
Hsiang-wei Chao
...
Yuwei Fang
Hsin-Ying Lee
Jian Ren
Ming-Hsuan Yang
Sergey Tulyakov
VGen
70
177
0
29 Feb 2024
FiGCLIP: Fine-Grained CLIP Adaptation via Densely Annotated Videos
FiGCLIP: Fine-Grained CLIP Adaptation via Densely Annotated Videos
S. DarshanSingh
Zeeshan Khan
Makarand Tapaswi
VLM
CLIP
21
3
0
15 Jan 2024
Set Prediction Guided by Semantic Concepts for Diverse Video Captioning
Set Prediction Guided by Semantic Concepts for Diverse Video Captioning
Yifan Lu
Ziqi Zhang
Chunfen Yuan
Peng Li
Yan Wang
Bing Li
Weiming Hu
21
3
0
25 Dec 2023
Cross-Modal Reasoning with Event Correlation for Video Question
  Answering
Cross-Modal Reasoning with Event Correlation for Video Question Answering
Chengxiang Yin
Zhengping Che
Kun Wu
Zhiyuan Xu
Qinru Qiu
Jian Tang
16
0
0
20 Dec 2023
TimeChat: A Time-sensitive Multimodal Large Language Model for Long
  Video Understanding
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
Shuhuai Ren
Linli Yao
Shicheng Li
Xu Sun
Lu Hou
VLM
MLLM
10
142
0
04 Dec 2023
Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities
  Using Web Instructional Videos
Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities Using Web Instructional Videos
Takehiko Ohkawa
Takuma Yagi
Taichi Nishimura
Ryosuke Furuta
Atsushi Hashimoto
Yoshitaka Ushiku
Yoichi Sato
EgoV
28
7
0
28 Nov 2023
Efficient Pre-training for Localized Instruction Generation of Videos
Efficient Pre-training for Localized Instruction Generation of Videos
Anil Batra
Davide Moltisanti
Laura Sevilla-Lara
Marcus Rohrbach
Frank Keller
12
0
0
27 Nov 2023
CLearViD: Curriculum Learning for Video Description
CLearViD: Curriculum Learning for Video Description
Cheng-Yu Chuang
Pooyan Fazli
20
1
0
08 Nov 2023
123
Next