ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1711.08097
  4. Cited By
v1v2 (latest)

Integrating both Visual and Audio Cues for Enhanced Video Caption

22 November 2017
Wangli Hao
Zhaoxiang Zhang
He Guan
Guibo Zhu
ArXiv (abs)PDFHTML

Papers citing "Integrating both Visual and Audio Cues for Enhanced Video Caption"

9 / 9 papers shown
Rethink Cross-Modal Fusion in Weakly-Supervised Audio-Visual Video
  Parsing
Rethink Cross-Modal Fusion in Weakly-Supervised Audio-Visual Video ParsingIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Yating Xu
Conghui Hu
Gim Hee Lee
219
8
0
14 Nov 2023
Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature
  Alignment
Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature AlignmentIEEE International Conference on Computer Vision (ICCV), 2023
Sarah Ibrahimi
Xiaohang Sun
Pichao Wang
Amanmeet Garg
Ashutosh Sanan
Mohamed Omar
342
37
0
24 Jul 2023
Visual Sensation and Perception Computational Models for Deep Learning:
  State of the art, Challenges and Prospects
Visual Sensation and Perception Computational Models for Deep Learning: State of the art, Challenges and Prospects
Bing Wei
Yudi Zhao
K. Hao
Lei Gao
274
5
0
08 Sep 2021
A Better Use of Audio-Visual Cues: Dense Video Captioning with Bi-modal
  Transformer
A Better Use of Audio-Visual Cues: Dense Video Captioning with Bi-modal Transformer
Vladimir E. Iashin
Esa Rahtu
283
128
0
17 May 2020
Multi-modal Dense Video Captioning
Multi-modal Dense Video Captioning
Vladimir E. Iashin
Esa Rahtu
494
207
0
17 Mar 2020
A Case Study on Combining ASR and Visual Features for Generating
  Instructional Video Captions
A Case Study on Combining ASR and Visual Features for Generating Instructional Video CaptionsConference on Computational Natural Language Learning (CoNLL), 2019
Jack Hessel
Bo Pang
Zhenhai Zhu
Radu Soricut
216
39
0
07 Oct 2019
Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event
  Captioning
Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event CaptioningIEEE International Conference on Computer Vision (ICCV), 2019
Tanzila Rahman
Bicheng Xu
Leonid Sigal
276
88
0
22 Sep 2019
Temporal Deformable Convolutional Encoder-Decoder Networks for Video
  Captioning
Temporal Deformable Convolutional Encoder-Decoder Networks for Video CaptioningAAAI Conference on Artificial Intelligence (AAAI), 2019
Jingwen Chen
Yingwei Pan
Yehao Li
Ting Yao
Hongyang Chao
Tao Mei
251
107
0
03 May 2019
Coupled Recurrent Network (CRN)
Coupled Recurrent Network (CRN)
Lin Sun
Kui Jia
Yuejia Shen
Silvio Savarese
Dit-Yan Yeung
Bertram E. Shi
177
5
0
25 Dec 2018
1
Page 1 of 1