ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2108.07781
  4. Cited By
End-to-End Dense Video Captioning with Parallel Decoding

End-to-End Dense Video Captioning with Parallel Decoding

17 August 2021
Teng Wang
Ruimao Zhang
Zhichao Lu
Feng Zheng
Ran Cheng
Ping Luo
    3DV
ArXivPDFHTML

Papers citing "End-to-End Dense Video Captioning with Parallel Decoding"

50 / 102 papers shown
Title
Dense Video Captioning: A Survey of Techniques, Datasets and Evaluation
  Protocols
Dense Video Captioning: A Survey of Techniques, Datasets and Evaluation Protocols
Iqra Qasim
Alexander Horsch
Dilip K. Prasad
9
4
0
05 Nov 2023
VidCoM: Fast Video Comprehension through Large Language Models with
  Multimodal Tools
VidCoM: Fast Video Comprehension through Large Language Models with Multimodal Tools
Huihui Gong
Minjing Dong
Siqi Ma
S. Çamtepe
Chang Xu
Lei Hou
Surya Nepal
VLM
MLLM
42
0
0
16 Oct 2023
AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description
AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
DiffM
19
36
0
10 Oct 2023
Learning Interactive Real-World Simulators
Learning Interactive Real-World Simulators
Mengjiao Yang
Yilun Du
Kamyar Ghasemipour
Jonathan Tompson
Leslie Kaelbling
Dale Schuurmans
Pieter Abbeel
LM&Ro
PINN
8
174
0
09 Oct 2023
Latent Wander: an Alternative Interface for Interactive and
  Serendipitous Discovery of Large AV Archives
Latent Wander: an Alternative Interface for Interactive and Serendipitous Discovery of Large AV Archives
Yuchen Yang
Linyida Zhang
11
2
0
09 Oct 2023
Write What You Want: Applying Text-to-video Retrieval to Audiovisual
  Archives
Write What You Want: Applying Text-to-video Retrieval to Audiovisual Archives
Yuchen Yang
VGen
11
7
0
09 Oct 2023
Human-centric Behavior Description in Videos: New Benchmark and Model
Human-centric Behavior Description in Videos: New Benchmark and Model
Lingru Zhou
Yi-Meng Gao
Manqing Zhang
Peng Wu
Peng Wang
Yanning Zhang
23
1
0
04 Oct 2023
VidChapters-7M: Video Chapters at Scale
VidChapters-7M: Video Chapters at Scale
Antoine Yang
Arsha Nagrani
Ivan Laptev
Josef Sivic
Cordelia Schmid
VGen
13
26
0
25 Sep 2023
Towards Surveillance Video-and-Language Understanding: New Dataset,
  Baselines, and Challenges
Towards Surveillance Video-and-Language Understanding: New Dataset, Baselines, and Challenges
Tongtong Yuan
Xuange Zhang
Kun Liu
Bo Liu
Chen Chen
Jian Jin
Zhenzhen Jiao
AI4TS
11
13
0
25 Sep 2023
Collaborative Three-Stream Transformers for Video Captioning
Collaborative Three-Stream Transformers for Video Captioning
Hao Wang
Libo Zhang
Hengrui Fan
Tiejian Luo
16
6
0
18 Sep 2023
Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End
  3D Dense Captioning
Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense Captioning
Sijin Chen
Hongyuan Zhu
Mingsheng Li
Xin Chen
Peng Guo
Yinjie Lei
Gang Yu
Taihao Li
Tao Chen
11
17
0
06 Sep 2023
ViGT: Proposal-free Video Grounding with Learnable Token in Transformer
ViGT: Proposal-free Video Grounding with Learnable Token in Transformer
Kun Li
Dan Guo
Meng Wang
ViT
8
36
0
11 Aug 2023
No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention
  and Zoom-in Boundary Detection
No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention and Zoom-in Boundary Detection
Qi Zhang
S. Zheng
Qin Jin
10
1
0
20 Jul 2023
Zero-Shot Dense Video Captioning by Jointly Optimizing Text and Moment
Zero-Shot Dense Video Captioning by Jointly Optimizing Text and Moment
Yongrae Jo
Seongyun Lee
Aiden Seung Joon Lee
Hyunji Lee
Hanseok Oh
Minjoon Seo
16
1
0
05 Jul 2023
REFLECT: Summarizing Robot Experiences for Failure Explanation and
  Correction
REFLECT: Summarizing Robot Experiences for Failure Explanation and Correction
Zeyi Liu
Arpit Bahety
Shuran Song
LRM
8
114
0
27 Jun 2023
Dense Video Object Captioning from Disjoint Supervision
Dense Video Object Captioning from Disjoint Supervision
Xingyi Zhou
Anurag Arnab
Chen Sun
Cordelia Schmid
12
2
0
20 Jun 2023
PaLI-X: On Scaling up a Multilingual Vision and Language Model
PaLI-X: On Scaling up a Multilingual Vision and Language Model
Xi Chen
Josip Djolonga
Piotr Padlewski
Basil Mustafa
Soravit Changpinyo
...
Mojtaba Seyedhosseini
A. Angelova
Xiaohua Zhai
N. Houlsby
Radu Soricut
VLM
40
186
0
29 May 2023
MH-DETR: Video Moment and Highlight Detection with Cross-modal
  Transformer
MH-DETR: Video Moment and Highlight Detection with Cross-modal Transformer
Yifang Xu
Yunzhuo Sun
Yang Li
Yilei Shi
Xiaoxia Zhu
S. Du
ViT
35
33
0
29 Apr 2023
A Review of Deep Learning for Video Captioning
A Review of Deep Learning for Video Captioning
Moloud Abdar
Meenakshi Kollati
Swaraja Kuraparthi
Farhad Pourpanah
Daniel J. McDuff
...
Shuicheng Yan
Abduallah A. Mohamed
Abbas Khosravi
Erik Cambria
Fatih Porikli
3DV
17
20
0
22 Apr 2023
SoccerNet-Caption: Dense Video Captioning for Soccer Broadcasts
  Commentaries
SoccerNet-Caption: Dense Video Captioning for Soccer Broadcasts Commentaries
Hassan Mkhallati
A. Cioppa
Silvio Giancola
Bernard Ghanem
Marc Van Droogenbroeck
22
32
0
10 Apr 2023
AutoAD: Movie Description in Context
AutoAD: Movie Description in Context
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
14
34
0
29 Mar 2023
Fine-grained Audible Video Description
Fine-grained Audible Video Description
Xuyang Shen
Dong Li
Jinxing Zhou
Zhen Qin
Bowen He
...
Yuchao Dai
Lingpeng Kong
Meng Wang
Yu Qiao
Yiran Zhong
VGen
28
11
0
27 Mar 2023
Implicit and Explicit Commonsense for Multi-sentence Video Captioning
Implicit and Explicit Commonsense for Multi-sentence Video Captioning
Shih-Han Chou
James J. Little
Leonid Sigal
13
2
0
14 Mar 2023
Learning Grounded Vision-Language Representation for Versatile
  Understanding in Untrimmed Videos
Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos
Teng Wang
Jinrui Zhang
Feng Zheng
Wenhao Jiang
Ran Cheng
Ping Luo
VLM
26
10
0
11 Mar 2023
DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only
  Training
DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training
Wei Li
Linchao Zhu
Longyin Wen
Yi Yang
VLM
40
81
0
06 Mar 2023
Models See Hallucinations: Evaluating the Factuality in Video Captioning
Models See Hallucinations: Evaluating the Factuality in Video Captioning
Hui Liu
Xiaojun Wan
HILM
25
10
0
06 Mar 2023
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense
  Video Captioning
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Antoine Yang
Arsha Nagrani
Paul Hongsuck Seo
Antoine Miech
Jordi Pont-Tuset
Ivan Laptev
Josef Sivic
Cordelia Schmid
AI4TS
VLM
18
219
0
27 Feb 2023
Exploiting Auxiliary Caption for Video Grounding
Exploiting Auxiliary Caption for Video Grounding
Hongxiang Li
Meng Cao
Xuxin Cheng
Zhihong Zhu
Yaowei Li
Yuexian Zou
14
10
0
15 Jan 2023
End-to-End 3D Dense Captioning with Vote2Cap-DETR
End-to-End 3D Dense Captioning with Vote2Cap-DETR
Sijin Chen
Hongyuan Zhu
Xin Chen
Yinjie Lei
Tao Chen
YU Gang
ViT
11
51
0
06 Jan 2023
Contextual Explainable Video Representation: Human Perception-based
  Understanding
Contextual Explainable Video Representation: Human Perception-based Understanding
Khoa T. Vo
Kashu Yamazaki
Phong H. Nguyen
Pha Nguyen
Khoa Luu
Ngan Le
11
9
0
12 Dec 2022
VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video
  Paragraph Captioning
VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning
Kashu Yamazaki
Khoa T. Vo
Sang Truong
Bhiksha Raj
Ngan Le
21
34
0
28 Nov 2022
Event and Entity Extraction from Generated Video Captions
Event and Entity Extraction from Generated Video Captions
Johannes Scherer
A. Scherp
Deepayan Bhowmik
19
0
0
05 Nov 2022
Zero-shot Video Moment Retrieval With Off-the-Shelf Models
Zero-shot Video Moment Retrieval With Off-the-Shelf Models
Anuj Diwan
Puyuan Peng
Raymond J. Mooney
VLM
23
2
0
03 Nov 2022
EmbryosFormer: Deformable Transformer and Collaborative
  Encoding-Decoding for Embryos Stage Development Classification
EmbryosFormer: Deformable Transformer and Collaborative Encoding-Decoding for Embryos Stage Development Classification
Tien-Phat Nguyen
Trong-Thang Pham
Tri Minh Nguyen
H. Le
Dung Nguyen
Hau Lam
Phong H. Nguyen
Jennifer Fowler
Minh-Triet Tran
Ngan Le
ViT
25
13
0
07 Oct 2022
Music-to-Text Synaesthesia: Generating Descriptive Text from Music
  Recordings
Music-to-Text Synaesthesia: Generating Descriptive Text from Music Recordings
Zhihuan Kuang
Shi Zong
Jianbing Zhang
Jiajun Chen
Hongfu Liu
22
4
0
02 Oct 2022
A Closer Look at Temporal Ordering in the Segmentation of Instructional
  Videos
A Closer Look at Temporal Ordering in the Segmentation of Instructional Videos
Anil Batra
Shreyank N. Gowda
Frank Keller
Laura Sevilla-Lara
16
5
0
30 Sep 2022
Recipe Generation from Unsegmented Cooking Videos
Recipe Generation from Unsegmented Cooking Videos
Taichi Nishimura
Atsushi Hashimoto
Yoshitaka Ushiku
Hirotaka Kameko
Shinsuke Mori
10
3
0
21 Sep 2022
SAVCHOI: Detecting Suspicious Activities using Dense Video Captioning
  with Human Object Interactions
SAVCHOI: Detecting Suspicious Activities using Dense Video Captioning with Human Object Interactions
Ansh Mittal
Shuvam Ghosal
Rishibha Bansal
30
3
0
24 Jul 2022
Unifying Event Detection and Captioning as Sequence Generation via
  Pre-Training
Unifying Event Detection and Captioning as Sequence Generation via Pre-Training
Qi Zhang
Yuqing Song
Qin Jin
22
21
0
18 Jul 2022
PIC 4th Challenge: Semantic-Assisted Multi-Feature Encoding and
  Multi-Head Decoding for Dense Video Captioning
PIC 4th Challenge: Semantic-Assisted Multi-Feature Encoding and Multi-Head Decoding for Dense Video Captioning
Yifan Lu
Ziqi Zhang
Yuxin Chen
Chunfen Yuan
Bing Li
Weiming Hu
32
1
0
06 Jul 2022
Exploiting Context Information for Generic Event Boundary Captioning
Exploiting Context Information for Generic Event Boundary Captioning
Jinrui Zhang
Teng Wang
Feng Zheng
Ran Cheng
Ping Luo
62
5
0
03 Jul 2022
VLCap: Vision-Language with Contrastive Learning for Coherent Video
  Paragraph Captioning
VLCap: Vision-Language with Contrastive Learning for Coherent Video Paragraph Captioning
Kashu Yamazaki
Sang Truong
Khoa T. Vo
Michael Kidd
Chase Rainwater
Khoa Luu
Ngan Le
VLM
CoGe
11
25
0
26 Jun 2022
Toward Clinically Assisted Colorectal Polyp Recognition via Structured
  Cross-modal Representation Consistency
Toward Clinically Assisted Colorectal Polyp Recognition via Structured Cross-modal Representation Consistency
Weijie Ma
Ye Zhu
Ruimao Zhang
Jie-jin Yang
Yiwen Hu
Zhuguo Li
Lijuan Xiang
ViT
MedIm
4
3
0
23 Jun 2022
Multimodal Learning with Transformers: A Survey
Multimodal Learning with Transformers: A Survey
P. Xu
Xiatian Zhu
David A. Clifton
ViT
41
518
0
13 Jun 2022
Future Transformer for Long-term Action Anticipation
Future Transformer for Long-term Action Anticipation
Dayoung Gong
Joonseok Lee
Manjin Kim
S. Ha
Minsu Cho
AI4TS
8
59
0
27 May 2022
CapOnImage: Context-driven Dense-Captioning on Image
CapOnImage: Context-driven Dense-Captioning on Image
Yiqi Gao
Xinglin Hou
Yuanmeng Zhang
T. Ge
Yuning Jiang
Peifeng Wang
17
10
0
27 Apr 2022
End-to-end Dense Video Captioning as Sequence Generation
End-to-end Dense Video Captioning as Sequence Generation
Wanrong Zhu
Bo Pang
Ashish V. Thapliyal
William Yang Wang
Radu Soricut
DiffM
11
32
0
18 Apr 2022
GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval
GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval
Yuxuan Wang
Difei Gao
Licheng Yu
Stan Weixian Lei
Matt Feiszli
Mike Zheng Shou
9
24
0
01 Apr 2022
Visual Abductive Reasoning
Visual Abductive Reasoning
Chen Liang
Wenguan Wang
Tianfei Zhou
Yi Yang
LRM
26
38
0
26 Mar 2022
Pretrained Language Models for Text Generation: A Survey
Pretrained Language Models for Text Generation: A Survey
Junyi Li
Tianyi Tang
Wayne Xin Zhao
J. Nie
Ji-Rong Wen
AI4CE
15
120
0
14 Jan 2022
Previous
123
Next