ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1503.01070
  4. Cited By
Using Descriptive Video Services to Create a Large Data Source for Video
  Annotation Research

Using Descriptive Video Services to Create a Large Data Source for Video Annotation Research

3 March 2015
Atousa Torabi
C. Pal
Hugo Larochelle
Aaron Courville
    VGen
ArXivPDFHTML

Papers citing "Using Descriptive Video Services to Create a Large Data Source for Video Annotation Research"

25 / 25 papers shown
Title
Audio-Language Datasets of Scenes and Events: A Survey
Audio-Language Datasets of Scenes and Events: A Survey
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
81
2
0
10 Jan 2025
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai
Enxin Song
Y. Du
Chenlin Meng
Vashisht Madhavan
Omer Bar-Tal
Jeng-Neng Hwang
Saining Xie
Christopher D. Manning
3DV
82
25
0
04 Oct 2024
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context
  Learning
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning
Chaoyi Zhang
K. Lin
Zhengyuan Yang
Jianfeng Wang
Linjie Li
Chung-Ching Lin
Zicheng Liu
Lijuan Wang
VGen
18
28
0
29 Nov 2023
Beyond Grounding: Extracting Fine-Grained Event Hierarchies Across
  Modalities
Beyond Grounding: Extracting Fine-Grained Event Hierarchies Across Modalities
Hammad A. Ayyubi
Christopher Thomas
Lovish Chum
R. Lokesh
Long Chen
...
Xudong Lin
Xuande Feng
Jaywon Koo
Sounak Ray
Shih-Fu Chang
AI4TS
23
0
0
14 Jun 2022
An Integrated Approach for Video Captioning and Applications
An Integrated Approach for Video Captioning and Applications
Soheyla Amirian
T. Taha
Khaled Rasheed
H. Arabnia
26
1
0
23 Jan 2022
MAD: A Scalable Dataset for Language Grounding in Videos from Movie
  Audio Descriptions
MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions
Mattia Soldan
Alejandro Pardo
Juan Carlos León Alcázar
Fabian Caba Heilbron
Chen Zhao
Silvio Giancola
Bernard Ghanem
VGen
39
95
0
01 Dec 2021
Understanding Chinese Video and Language via Contrastive Multimodal
  Pre-Training
Understanding Chinese Video and Language via Contrastive Multimodal Pre-Training
Chenyi Lei
Shixian Luo
Yong-jin Liu
Wanggui He
Jiamang Wang
Guoxin Wang
Haihong Tang
C. Miao
Houqiang Li
28
41
0
19 Apr 2021
MDMMT: Multidomain Multimodal Transformer for Video Retrieval
MDMMT: Multidomain Multimodal Transformer for Video Retrieval
Maksim Dzabraev
M. Kalashnikov
Stepan Alekseevich Komkov
Aleksandr Petiushko
13
128
0
19 Mar 2021
Enriching Video Captions With Contextual Text
Enriching Video Captions With Contextual Text
Philipp Rimle
Pelin Dogan
Markus Gross
22
3
0
29 Jul 2020
Comprehensive Information Integration Modeling Framework for Video
  Titling
Comprehensive Information Integration Modeling Framework for Video Titling
Shengyu Zhang
Ziqi Tan
Jin Yu
Zhou Zhao
Kun Kuang
Tan Jiang
Jingren Zhou
Hongxia Yang
Fei Wu
21
40
0
24 Jun 2020
VATEX: A Large-Scale, High-Quality Multilingual Dataset for
  Video-and-Language Research
VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research
Xin Eric Wang
Jiawei Wu
Junkun Chen
Lei Li
Yuan-fang Wang
William Yang Wang
15
540
0
06 Apr 2019
MovieGraphs: Towards Understanding Human-Centric Situations from Videos
MovieGraphs: Towards Understanding Human-Centric Situations from Videos
Paul Vicol
Makarand Tapaswi
Lluis Castrejon
Sanja Fidler
20
136
0
19 Dec 2017
DeepStory: Video Story QA by Deep Embedded Memory Networks
DeepStory: Video Story QA by Deep Embedded Memory Networks
Kyung-Min Kim
Min-Oh Heo
Seongho Choi
Byoung-Tak Zhang
19
174
0
04 Jul 2017
The "something something" video database for learning and evaluating
  visual common sense
The "something something" video database for learning and evaluating visual common sense
Raghav Goyal
Samira Ebrahimi Kahou
Vincent Michalski
Joanna Materzynska
S. Westphal
...
Moritz Mueller-Freitag
F. Hoppe
Christian Thurau
Ingo Bax
Roland Memisevic
VLM
13
1,494
0
13 Jun 2017
Multimodal Machine Learning: A Survey and Taxonomy
Multimodal Machine Learning: A Survey and Taxonomy
T. Baltrušaitis
Chaitanya Ahuja
Louis-Philippe Morency
13
2,856
0
26 May 2017
Dense-Captioning Events in Videos
Dense-Captioning Events in Videos
Ranjay Krishna
Kenji Hata
F. Ren
Li Fei-Fei
Juan Carlos Niebles
48
1,214
0
02 May 2017
Multi-Task Video Captioning with Video and Entailment Generation
Multi-Task Video Captioning with Video and Entailment Generation
Ramakanth Pasunuru
Mohit Bansal
25
116
0
24 Apr 2017
Adaptive Feature Abstraction for Translating Video to Text
Adaptive Feature Abstraction for Translating Video to Text
Yunchen Pu
Martin Renqiang Min
Zhe Gan
Lawrence Carin
29
14
0
23 Nov 2016
Title Generation for User Generated Videos
Title Generation for User Generated Videos
Kuo-Hao Zeng
Tseng-Hung Chen
Juan Carlos Niebles
Min Sun
27
68
0
25 Aug 2016
Movie Description
Movie Description
Anna Rohrbach
Atousa Torabi
Marcus Rohrbach
Niket Tandon
C. Pal
Hugo Larochelle
Aaron Courville
Bernt Schiele
3DV
VGen
30
353
0
12 May 2016
TGIF: A New Dataset and Benchmark on Animated GIF Description
TGIF: A New Dataset and Benchmark on Animated GIF Description
Yuncheng Li
Yale Song
Liangliang Cao
Joel R. Tetreault
Larry Goldberg
A. Jaimes
Jiebo Luo
16
269
0
10 Apr 2016
Improving LSTM-based Video Description with Linguistic Knowledge Mined
  from Text
Improving LSTM-based Video Description with Linguistic Knowledge Mined from Text
Subhashini Venugopalan
Lisa Anne Hendricks
Raymond J. Mooney
Kate Saenko
VLM
20
117
0
06 Apr 2016
Describing Multimedia Content using Attention-based Encoder--Decoder
  Networks
Describing Multimedia Content using Attention-based Encoder--Decoder Networks
Kyunghyun Cho
Aaron Courville
Yoshua Bengio
32
411
0
04 Jul 2015
Jointly Modeling Embedding and Translation to Bridge Video and Language
Jointly Modeling Embedding and Translation to Bridge Video and Language
Yingwei Pan
Tao Mei
Ting Yao
Houqiang Li
Y. Rui
29
534
0
07 May 2015
A Dataset for Movie Description
A Dataset for Movie Description
Anna Rohrbach
Marcus Rohrbach
Niket Tandon
Bernt Schiele
VGen
26
498
0
12 Jan 2015
1