ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.08271
  4. Cited By
A Better Use of Audio-Visual Cues: Dense Video Captioning with Bi-modal
  Transformer

A Better Use of Audio-Visual Cues: Dense Video Captioning with Bi-modal Transformer

17 May 2020
Vladimir E. Iashin
Esa Rahtu
ArXivPDFHTML

Papers citing "A Better Use of Audio-Visual Cues: Dense Video Captioning with Bi-modal Transformer"

18 / 18 papers shown
Title
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai
Enxin Song
Y. Du
Chenlin Meng
Vashisht Madhavan
Omer Bar-Tal
Jeng-Neng Hwang
Saining Xie
Christopher D. Manning
3DV
79
25
0
04 Oct 2024
GUIDE: A Guideline-Guided Dataset for Instructional Video Comprehension
GUIDE: A Guideline-Guided Dataset for Instructional Video Comprehension
Jiafeng Liang
Shixin Jiang
Zekun Wang
Haojie Pan
Zerui Chen
Zheng Chu
Ming Liu
Ruiji Fu
Zhongyuan Wang
Bing Qin
29
2
0
26 Jun 2024
A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing
  Objects in 3D Scenes
A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D Scenes
Ting Yu
Xiaojun Lin
Shuhui Wang
Weiguo Sheng
Qingming Huang
Jun-chen Yu
3DV
37
10
0
12 Mar 2024
AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description
AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
DiffM
19
36
0
10 Oct 2023
Hierarchical Video-Moment Retrieval and Step-Captioning
Hierarchical Video-Moment Retrieval and Step-Captioning
Abhaysinh Zala
Jaemin Cho
Satwik Kottur
Xilun Chen
Barlas Ouguz
Yasher Mehdad
Mohit Bansal
3DV
18
51
0
29 Mar 2023
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense
  Video Captioning
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Antoine Yang
Arsha Nagrani
Paul Hongsuck Seo
Antoine Miech
Jordi Pont-Tuset
Ivan Laptev
Josef Sivic
Cordelia Schmid
AI4TS
VLM
23
220
0
27 Feb 2023
Event and Entity Extraction from Generated Video Captions
Event and Entity Extraction from Generated Video Captions
Johannes Scherer
A. Scherp
Deepayan Bhowmik
19
0
0
05 Nov 2022
A Closer Look at Temporal Ordering in the Segmentation of Instructional
  Videos
A Closer Look at Temporal Ordering in the Segmentation of Instructional Videos
Anil Batra
Shreyank N. Gowda
Frank Keller
Laura Sevilla-Lara
24
5
0
30 Sep 2022
Multimodal Frame-Scoring Transformer for Video Summarization
Multimodal Frame-Scoring Transformer for Video Summarization
Jeiyoon Park
Kiho Kwoun
Chanhee Lee
Heuiseok Lim
ViT
25
6
0
05 Jul 2022
Automated Audio Captioning: An Overview of Recent Progress and New
  Challenges
Automated Audio Captioning: An Overview of Recent Progress and New Challenges
Xinhao Mei
Xubo Liu
Mark D. Plumbley
Wenwu Wang
27
37
0
12 May 2022
End-to-end Generative Pretraining for Multimodal Video Captioning
End-to-end Generative Pretraining for Multimodal Video Captioning
Paul Hongsuck Seo
Arsha Nagrani
Anurag Arnab
Cordelia Schmid
27
164
0
20 Jan 2022
Video Transformers: A Survey
Video Transformers: A Survey
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
20
102
0
16 Jan 2022
Tell me what you see: A zero-shot action recognition method based on
  natural language descriptions
Tell me what you see: A zero-shot action recognition method based on natural language descriptions
Valter Estevam
Rayson Laroca
David Menotti
Hélio Pedrini
25
13
0
18 Dec 2021
DVCFlow: Modeling Information Flow Towards Human-like Video Captioning
DVCFlow: Modeling Information Flow Towards Human-like Video Captioning
Xu Yan
Zhengcong Fei
Shuhui Wang
Qingming Huang
Qi Tian
VGen
28
4
0
19 Nov 2021
Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal
  Attention
Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal Attention
Katsuyuki Nakamura
Hiroki Ohashi
Mitsuhiro Okada
EgoV
31
12
0
07 Sep 2021
End-to-End Dense Video Captioning with Parallel Decoding
End-to-End Dense Video Captioning with Parallel Decoding
Teng Wang
Ruimao Zhang
Zhichao Lu
Feng Zheng
Ran Cheng
Ping Luo
3DV
30
179
0
17 Aug 2021
CLIP-It! Language-Guided Video Summarization
CLIP-It! Language-Guided Video Summarization
Medhini Narasimhan
Anna Rohrbach
Trevor Darrell
CLIP
15
113
0
01 Jul 2021
TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization
  Tasks
TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks
Humam Alwassel
Silvio Giancola
Bernard Ghanem
28
123
0
23 Nov 2020
1