ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1708.02478
  4. Cited By
From Deterministic to Generative: Multi-Modal Stochastic RNNs for Video
  Captioning
v1v2 (latest)

From Deterministic to Generative: Multi-Modal Stochastic RNNs for Video Captioning

IEEE Transactions on Neural Networks and Learning Systems (IEEE TNNLS), 2017
8 August 2017
Jingkuan Song
Yuyu Guo
Lianli Gao
Xuelong Li
Alan Hanjalic
Heng Tao Shen
ArXiv (abs)PDFHTML

Papers citing "From Deterministic to Generative: Multi-Modal Stochastic RNNs for Video Captioning"

31 / 31 papers shown
A Statistical Framework for Model Selection in LSTM Networks
A Statistical Framework for Model Selection in LSTM Networks
Fahad Mostafa
115
1
0
07 Jun 2025
EVC-MF: End-to-end Video Captioning Network with Multi-scale Features
EVC-MF: End-to-end Video Captioning Network with Multi-scale Features
Tian-Zi Niu
Zhen-Duo Chen
Xin Luo
Xin-Shun Xu
275
0
0
22 Oct 2024
How to Understand Named Entities: Using Common Sense for News Captioning
How to Understand Named Entities: Using Common Sense for News CaptioningACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP) (TOMCCAP), 2024
Ning Xu
Yanhui Wang
Tingting Zhang
Hongshuo Tian
Mohan Kankanhalli
An-An Liu
230
0
0
11 Mar 2024
Video ReCap: Recursive Captioning of Hour-Long Videos
Video ReCap: Recursive Captioning of Hour-Long Videos
Md. Mohaiminul Islam
Ngan Ho
Xitong Yang
Tushar Nagarajan
Lorenzo Torresani
Gedas Bertasius
VGenVLM
788
93
0
20 Feb 2024
SEM-POS: Grammatically and Semantically Correct Video Captioning
SEM-POS: Grammatically and Semantically Correct Video Captioning
Asmar Nadeem
A. Hilton
R. Dawes
Graham A. Thomas
A. Mustafa
250
10
0
26 Mar 2023
Visual Commonsense-aware Representation Network for Video Captioning
Visual Commonsense-aware Representation Network for Video CaptioningIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022
Pengpeng Zeng
Haonan Zhang
Lianli Gao
Xiangpeng Li
Jin Qian
Hengtao Shen
195
25
0
17 Nov 2022
Hybrid Reinforced Medical Report Generation with M-Linear Attention and
  Repetition Penalty
Hybrid Reinforced Medical Report Generation with M-Linear Attention and Repetition PenaltyIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022
Wenting Xu
Zhenghua Xu
Junyang Chen
Chang Qi
Thomas Lukasiewicz
MedIm
250
18
0
14 Oct 2022
Structured Two-stream Attention Network for Video Question Answering
Structured Two-stream Attention Network for Video Question AnsweringAAAI Conference on Artificial Intelligence (AAAI), 2019
Lianli Gao
Pengpeng Zeng
Jingkuan Song
Yuan-Fang Li
Wu Liu
Tao Mei
Heng Tao Shen
304
71
0
02 Jun 2022
Video Captioning: a comparative review of where we are and which could
  be the route
Video Captioning: a comparative review of where we are and which could be the routeComputer Vision and Image Understanding (CVIU), 2022
Daniela Moctezuma
Tania A. Ramirez-delreal
Guillermo Ruiz
Othón González-Chávez
288
17
0
12 Apr 2022
NeuroView-RNN: It's About Time
NeuroView-RNN: It's About TimeConference on Fairness, Accountability and Transparency (FAccT), 2022
C. Barberan
Sina Alemohammad
Naiming Liu
Randall Balestriero
Richard G. Baraniuk
AI4TSHAI
282
3
0
23 Feb 2022
One-shot Scene Graph Generation
One-shot Scene Graph GenerationACM Multimedia (ACM MM), 2020
Yuyu Guo
Jingkuan Song
Lianli Gao
Heng Tao Shen
249
31
0
22 Feb 2022
Efficient Visual Recognition with Deep Neural Networks: A Survey on
  Recent Advances and New Directions
Efficient Visual Recognition with Deep Neural Networks: A Survey on Recent Advances and New DirectionsMachine Intelligence Research (MIR), 2021
Yang Wu
Dingheng Wang
Xiaotong Lu
Fan Yang
Guoqi Li
Weiming Dong
Jianbo Shi
414
18
0
30 Aug 2021
A Comprehensive Review of the Video-to-Text Problem
A Comprehensive Review of the Video-to-Text ProblemArtificial Intelligence Review (AIR), 2021
Jesus Perez-Martin
B. Bustos
S. Guimarães
I. Sipiran
Jorge A. Pérez
Grethel Coello Said
314
20
0
27 Mar 2021
The Role of the Input in Natural Language Video Description
The Role of the Input in Natural Language Video DescriptionIEEE transactions on multimedia (TMM), 2020
S. Cascianelli
G. Costante
Alessandro Devo
Thomas Alessandro Ciarfuglia
P. Valigi
M. L. Fravolini
220
5
0
09 Feb 2021
Guidance Module Network for Video Captioning
Guidance Module Network for Video CaptioningCybersecurity and Cyberforensics Conference (CC), 2020
Xiao Zhang
Chunsheng Liu
F. Chang
119
4
0
20 Dec 2020
Universal Weighting Metric Learning for Cross-Modal Matching
Universal Weighting Metric Learning for Cross-Modal Matching
Jiwei Wei
Xing Xu
Yang Yang
Yanli Ji
Zheng Wang
Heng Tao Shen
193
100
0
07 Oct 2020
Unsupervised Online Anomaly Detection On Irregularly Sampled Or Missing
  Valued Time-Series Data Using LSTM Networks
Unsupervised Online Anomaly Detection On Irregularly Sampled Or Missing Valued Time-Series Data Using LSTM Networks
Oguzhan Karaahmetoglu
Fatih Ilhan
Ismail Balaban
Suleyman S. Kozat
AI4TS
201
5
0
25 May 2020
Towards Embodied Scene Description
Towards Embodied Scene Description
Sinan Tan
Huaping Liu
Di Guo
Xinyu Zhang
F. Sun
LM&Ro
169
11
0
30 Apr 2020
Learning Selective Sensor Fusion for States Estimation
Learning Selective Sensor Fusion for States EstimationIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2019
Changhao Chen
Stefano Rosa
Chris Xiaoxuan Lu
Bing Wang
Niki Trigoni
Andrew Markham
252
30
0
30 Dec 2019
Characterizing the impact of using features extracted from pre-trained
  models on the quality of video captioning sequence-to-sequence models
Characterizing the impact of using features extracted from pre-trained models on the quality of video captioning sequence-to-sequence modelsInternational Conferences on Pattern Recognition and Artificial Intelligence (ICCPRAI), 2019
Menatallh Hammad
May Hammad
Mohamed Elshenawy
122
2
0
22 Nov 2019
Video Captioning with Text-based Dynamic Attention and Step-by-Step
  Learning
Video Captioning with Text-based Dynamic Attention and Step-by-Step LearningPattern Recognition Letters (PR), 2019
Huanhou Xiao
Jinglun Shi
172
28
0
05 Nov 2019
Diverse Video Captioning Through Latent Variable Expansion
Diverse Video Captioning Through Latent Variable ExpansionPattern Recognition Letters (PR), 2019
Huanhou Xiao
Jinglun Shi
DiffM
406
15
0
26 Oct 2019
Multimodal Unified Attention Networks for Vision-and-Language
  Interactions
Multimodal Unified Attention Networks for Vision-and-Language Interactions
Zhou Yu
Yuhao Cui
Jun Yu
Dacheng Tao
Q. Tian
300
45
0
12 Aug 2019
Adaptive Exploration for Unsupervised Person Re-Identification
Adaptive Exploration for Unsupervised Person Re-Identification
Yuhang Ding
Hehe Fan
Mingliang Xu
Yezhou Yang
OOD
262
134
0
09 Jul 2019
Object-aware Aggregation with Bidirectional Temporal Graph for Video
  Captioning
Object-aware Aggregation with Bidirectional Temporal Graph for Video CaptioningComputer Vision and Pattern Recognition (CVPR), 2019
Junchao Zhang
Yuxin Peng
265
189
0
11 Jun 2019
Hierarchical LSTMs with Adaptive Attention for Visual Captioning
Hierarchical LSTMs with Adaptive Attention for Visual Captioning
Jingkuan Song
Xiangpeng Li
Lianli Gao
Heng Tao Shen
191
233
0
26 Dec 2018
The Gap of Semantic Parsing: A Survey on Automatic Math Word Problem
  Solvers
The Gap of Semantic Parsing: A Survey on Automatic Math Word Problem Solvers
Dongxiang Zhang
Lei Wang
Nuo Xu
B. Dai
Heng Tao Shen
ReLMAIMat
254
143
0
22 Aug 2018
Video Captioning with Boundary-aware Hierarchical Language Decoding and
  Joint Video Prediction
Video Captioning with Boundary-aware Hierarchical Language Decoding and Joint Video Prediction
Xiangxi Shi
Jianfei Cai
Jiuxiang Gu
Shafiq Joty
186
19
0
08 Jul 2018
COCO-CN for Cross-Lingual Image Tagging, Captioning and Retrieval
COCO-CN for Cross-Lingual Image Tagging, Captioning and Retrieval
Xirong Li
Chaoxi Xu
Xiaoxu Wang
Weiyu Lan
Zhengxiong Jia
Gang Yang
Jieping Xu
456
183
0
22 May 2018
Less Is More: Picking Informative Frames for Video Captioning
Less Is More: Picking Informative Frames for Video Captioning
Yangyu Chen
Shuhui Wang
Feiyu Xiong
Qingming Huang
258
207
0
05 Mar 2018
Self-Supervised Video Hashing with Hierarchical Binary Auto-encoder
Self-Supervised Video Hashing with Hierarchical Binary Auto-encoder
Jingkuan Song
Hanwang Zhang
Xiangpeng Li
Lianli Gao
Ming Wang
Richang Hong
218
266
0
07 Feb 2018
1
Page 1 of 1