Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1510.07712
Cited By
Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks
26 October 2015
Haonan Yu
Jiang Wang
Zhiheng Huang
Yi Yang
W. Xu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks"
50 / 216 papers shown
Title
Fair Dynamic Spectrum Access via Fully Decentralized Multi-Agent Reinforcement Learning
Yubo Zhang
Pedro Botelho
Trevor Gordon
Gil Zussman
I. Kadota
50
0
0
31 Mar 2025
Progress-Aware Video Frame Captioning
Zihui Xue
Joungbin An
Xitong Yang
Kristen Grauman
100
1
0
03 Dec 2024
JEMA: A Joint Embedding Framework for Scalable Co-Learning with Multimodal Alignment
Joao Sousa
Roya Darabi
A. A. Sousa
Frank Brueckner
Luís Paulo Reis
Ana Reis
26
1
0
31 Oct 2024
EVC-MF: End-to-end Video Captioning Network with Multi-scale Features
Tian-Zi Niu
Zhen-Duo Chen
Xin Luo
Xin-Shun Xu
24
0
0
22 Oct 2024
Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives
Thong Nguyen
Yi Bin
Junbin Xiao
Leigang Qu
Yicong Li
Jay Zhangjie Wu
Cong-Duy Nguyen
See-Kiong Ng
Luu Anh Tuan
VLM
41
9
1
09 Jun 2024
MICap: A Unified Model for Identity-aware Movie Descriptions
Haran Raajesh
Naveen Reddy Desanur
Zeeshan Khan
Makarand Tapaswi
26
4
0
19 May 2024
Cross-Modal Reasoning with Event Correlation for Video Question Answering
Chengxiang Yin
Zhengping Che
Kun Wu
Zhiyuan Xu
Qinru Qiu
Jian Tang
21
0
0
20 Dec 2023
Multi Sentence Description of Complex Manipulation Action Videos
Fatemeh Ziaeetabar
Reza Safabakhsh
S. Momtazi
M. Tamosiunaite
F. Worgotter
23
1
0
13 Nov 2023
ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models
.Ilker Kesen
Andrea Pedrotti
Mustafa Dogan
Michele Cafagna
Emre Can Acikgoz
...
Iacer Calixto
Anette Frank
Albert Gatt
Aykut Erdem
Erkut Erdem
33
15
0
13 Nov 2023
CLearViD: Curriculum Learning for Video Description
Cheng-Yu Chuang
Pooyan Fazli
28
1
0
08 Nov 2023
Collaborative Three-Stream Transformers for Video Captioning
Hao Wang
Libo Zhang
Hengrui Fan
Tiejian Luo
24
6
0
18 Sep 2023
Multimodal Prompt Learning for Product Title Generation with Extremely Limited Labels
Bang-ju Yang
Fenglin Liu
Zheng Li
Qingyu Yin
Chenyu You
Bing Yin
Yuexian Zou
VLM
26
5
0
05 Jul 2023
A Video Is Worth 4096 Tokens: Verbalize Videos To Understand Them In Zero Shot
Aanisha Bhattacharya
Yaman Kumar Singla
Balaji Krishnamurthy
R. Shah
Changyou Chen
VGen
19
11
0
16 May 2023
Visual Transformation Telling
Wanqing Cui
Mustafa Nasir-Moin
Yanyan Lan
Viola J. Chen
J. Guo
Xueqi Cheng
LRM
51
1
0
03 May 2023
Tensor Decomposition for Model Reduction in Neural Networks: A Review
Xingyi Liu
Keshab K. Parhi
25
12
0
26 Apr 2023
A Review of Deep Learning for Video Captioning
Moloud Abdar
Meenakshi Kollati
Swaraja Kuraparthi
Farhad Pourpanah
Daniel J. McDuff
...
Shuicheng Yan
Abduallah A. Mohamed
Abbas Khosravi
Erik Cambria
Fatih Porikli
3DV
25
20
0
22 Apr 2023
Improving Visual Question Answering Models through Robustness Analysis and In-Context Learning with a Chain of Basic Questions
Jia-Hong Huang
Modar Alfadly
Bernard Ghanem
M. Worring
OOD
AAML
33
5
0
06 Apr 2023
ADAPT: Action-aware Driving Caption Transformer
Bu Jin
Xinyi Liu
Yupeng Zheng
Pengfei Li
Hao Zhao
Tong Zhang
Yuhang Zheng
Guyue Zhou
Jingjing Liu
20
69
0
01 Feb 2023
METEOR Guided Divergence for Video Captioning
D. Rothenpieler
Shahin Amiriparian
21
3
0
20 Dec 2022
Event and Entity Extraction from Generated Video Captions
Johannes Scherer
A. Scherp
Deepayan Bhowmik
19
0
0
05 Nov 2022
Unsupervised Audio-Visual Lecture Segmentation
Darshan Singh
Anchit Gupta
C. V. Jawahar
Makarand Tapaswi
VOS
16
4
0
29 Oct 2022
Selective Query-guided Debiasing for Video Corpus Moment Retrieval
Sunjae Yoon
Jiajing Hong
Eunseop Yoon
Dahyun Kim
Junyeong Kim
Hee Suk Yoon
Changdong Yoo
33
21
0
17 Oct 2022
Thinking Hallucination for Video Captioning
Nasib Ullah
Partha Pratim Mohanta
VLM
31
4
0
28 Sep 2022
Zero-Shot Video Captioning with Evolving Pseudo-Tokens
Yoad Tewel
Yoav Shalev
Roy Nadler
Idan Schwartz
Lior Wolf
29
28
0
22 Jul 2022
Competence-based Multimodal Curriculum Learning for Medical Report Generation
Fenglin Liu
Shen Ge
Yuexian Zou
Xian Wu
MedIm
25
131
0
24 Jun 2022
DouFu: A Double Fusion Joint Learning Method For Driving Trajectory Representation
Han Wang
Zhou Huang
Xiao Zhou
Ganmin Yin
Yi Bao
Yihang Bao
29
4
0
05 May 2022
CenterCLIP: Token Clustering for Efficient Text-Video Retrieval
Shuai Zhao
Linchao Zhu
Xiaohan Wang
Yi Yang
VLM
CLIP
20
112
0
02 May 2022
Video Captioning: a comparative review of where we are and which could be the route
Daniela Moctezuma
Tania A. Ramirez-delreal
Guillermo Ruiz
Othón González-Chávez
19
11
0
12 Apr 2022
Exploiting long-term temporal dynamics for video captioning
Yuyu Guo
Jingqiu Zhang
Lianli Gao
9
18
0
22 Feb 2022
Discourse Analysis for Evaluating Coherence in Video Paragraph Captions
Arjun Reddy Akula
Song-Chun Zhu
29
3
0
17 Jan 2022
Self-Gated Memory Recurrent Network for Efficient Scalable HDR Deghosting
K. Prabhakar
Susmit Agrawal
R. Venkatesh Babu
16
10
0
24 Dec 2021
Question Answering Survey: Directions, Challenges, Datasets, Evaluation Matrices
Hariom A. Pandya
Brijesh S. Bhatt
36
27
0
07 Dec 2021
Controllable Video Captioning with an Exemplar Sentence
Yitian Yuan
Lin Ma
Jingwen Wang
Wenwu Zhu
11
20
0
02 Dec 2021
DVCFlow: Modeling Information Flow Towards Human-like Video Captioning
Xu Yan
Zhengcong Fei
Shuhui Wang
Qingming Huang
Qi Tian
VGen
28
4
0
19 Nov 2021
Video and Text Matching with Conditioned Embeddings
Ameen Ali
Idan Schwartz
Tamir Hazan
Lior Wolf
77
13
0
21 Oct 2021
Trajectory Prediction with Linguistic Representations
Yen-Ling Kuo
Xin Huang
Andrei Barbu
Stephen G. McGill
Boris Katz
J. Leonard
Guy Rosman
21
16
0
19 Oct 2021
CLIP4Caption: CLIP for Video Caption
Mingkang Tang
Zhanyu Wang
Zhenhua Liu
Fengyun Rao
Dian Li
Xiu Li
CLIP
VLM
27
149
0
13 Oct 2021
End-to-End Dense Video Captioning with Parallel Decoding
Teng Wang
Ruimao Zhang
Zhichao Lu
Feng Zheng
Ran Cheng
Ping Luo
3DV
25
179
0
17 Aug 2021
Optimizing Latency for Online Video CaptioningUsing Audio-Visual Transformers
Chiori Hori
Takaaki Hori
Jonathan Le Roux
12
4
0
04 Aug 2021
Boosting Video Captioning with Dynamic Loss Network
Nasib Ullah
Partha Pratim Mohanta
15
1
0
25 Jul 2021
Neural Abstructions: Abstractions that Support Construction for Grounded Language Learning
Kaylee Burns
Christopher D. Manning
Li Fei-Fei
11
0
0
20 Jul 2021
Exploring and Distilling Posterior and Prior Knowledge for Radiology Report Generation
Fenglin Liu
Xian Wu
Shen Ge
Wei Fan
Yuexian Zou
MedIm
21
247
0
13 Jun 2021
Learning to Guide a Saturation-Based Theorem Prover
Ibrahim Abdelaziz
M. Crouse
B. Makni
Vernon Austil
Cristina Cornelio
...
Pavan Kapanipathi
Ndivhuwo Makondo
Kavitha Srinivas
Michael Witbrock
Achille Fokoue
14
16
0
07 Jun 2021
Recent Advances and Trends in Multimodal Deep Learning: A Review
Jabeen Summaira
Xi Li
Amin Muhammad Shoib
Songyuan Li
Abdul Jabbar
HAI
10
55
0
24 May 2021
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
Huaishao Luo
Lei Ji
Ming Zhong
Yang Chen
Wen Lei
Nan Duan
Tianrui Li
CLIP
VLM
309
778
0
18 Apr 2021
Towards Extremely Compact RNNs for Video Recognition with Fully Decomposed Hierarchical Tucker Structure
Miao Yin
Siyu Liao
Xiao-Yang Liu
Xiaodong Wang
Bo Yuan
AI4TS
11
31
0
12 Apr 2021
CUPID: Adaptive Curation of Pre-training Data for Video-and-Language Representation Learning
Luowei Zhou
Jingjing Liu
Yu Cheng
Zhe Gan
Lei Zhang
10
7
0
01 Apr 2021
Embracing Uncertainty: Decoupling and De-bias for Robust Temporal Grounding
Hao Zhou
Chongyang Zhang
Yan Luo
Yanjun Chen
Chuanping Hu
16
52
0
31 Mar 2021
Attention, please! A survey of Neural Attention Models in Deep Learning
Alana de Santana Correia
Esther Luna Colombini
HAI
21
173
0
31 Mar 2021
A Comprehensive Review of the Video-to-Text Problem
Jesus Perez-Martin
B. Bustos
S. Guimarães
I. Sipiran
Jorge A. Pérez
Grethel Coello Said
8
17
0
27 Mar 2021
1
2
3
4
5
Next