ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1505.01861
  4. Cited By
Jointly Modeling Embedding and Translation to Bridge Video and Language
v1v2v3 (latest)

Jointly Modeling Embedding and Translation to Bridge Video and Language

7 May 2015
Yingwei Pan
Tao Mei
Ting Yao
Houqiang Li
Y. Rui
ArXiv (abs)PDFHTML

Papers citing "Jointly Modeling Embedding and Translation to Bridge Video and Language"

50 / 199 papers shown
Use What You Have: Video Retrieval Using Representations From
  Collaborative Experts
Use What You Have: Video Retrieval Using Representations From Collaborative ExpertsBritish Machine Vision Conference (BMVC), 2019
Yang Liu
Samuel Albanie
Arsha Nagrani
Andrew Zisserman
302
424
0
31 Jul 2019
Language2Pose: Natural Language Grounded Pose Forecasting
Language2Pose: Natural Language Grounded Pose ForecastingInternational Conference on 3D Vision (3DV), 2019
Chaitanya Ahuja
Louis-Philippe Morency
296
337
0
02 Jul 2019
Trimmed Action Recognition, Dense-Captioning Events in Videos, and
  Spatio-temporal Action Localization with Focus on ActivityNet Challenge 2019
Trimmed Action Recognition, Dense-Captioning Events in Videos, and Spatio-temporal Action Localization with Focus on ActivityNet Challenge 2019
Zhaofan Qiu
Dong Li
Yehao Li
Qi Cai
Yingwei Pan
Ting Yao
131
8
0
14 Jun 2019
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million
  Narrated Video Clips
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video ClipsIEEE International Conference on Computer Vision (ICCV), 2019
Antoine Miech
Dimitri Zhukov
Jean-Baptiste Alayrac
Makarand Tapaswi
Ivan Laptev
Josef Sivic
VGen
542
1,376
0
07 Jun 2019
Reconstruct and Represent Video Contents for Captioning via
  Reinforcement Learning
Reconstruct and Represent Video Contents for Captioning via Reinforcement LearningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2019
Wei Zhang
Bairui Wang
Lin Ma
Wei Liu
210
72
0
03 Jun 2019
Memory-Attended Recurrent Network for Video Captioning
Memory-Attended Recurrent Network for Video CaptioningComputer Vision and Pattern Recognition (CVPR), 2019
Wenjie Pei
Jiyuan Zhang
Xiangrong Wang
Lei Ke
Xiaoyong Shen
Yu-Wing Tai
259
225
0
10 May 2019
Multimodal Semantic Attention Network for Video Captioning
Multimodal Semantic Attention Network for Video CaptioningIEEE International Conference on Multimedia and Expo (ICME), 2019
Liang Sun
Bing Li
Chunfen Yuan
Zhengjun Zha
Weiming Hu
177
11
0
08 May 2019
Temporal Deformable Convolutional Encoder-Decoder Networks for Video
  Captioning
Temporal Deformable Convolutional Encoder-Decoder Networks for Video CaptioningAAAI Conference on Artificial Intelligence (AAAI), 2019
Jingwen Chen
Yingwei Pan
Yehao Li
Ting Yao
Hongyang Chao
Tao Mei
180
105
0
03 May 2019
Pointing Novel Objects in Image Captioning
Pointing Novel Objects in Image Captioning
Yehao Li
Ting Yao
Yingwei Pan
Hongyang Chao
Tao Mei
205
73
0
25 Apr 2019
Streamlined Dense Video Captioning
Streamlined Dense Video Captioning
Jonghwan Mun
L. Yang
Zhou Ren
N. Xu
Bohyung Han
257
160
1
08 Apr 2019
Weakly Supervised Video Moment Retrieval From Text Queries
Weakly Supervised Video Moment Retrieval From Text Queries
Niluthpol Chowdhury Mithun
S. Paul
Amit K. Roy-Chowdhury
303
212
0
05 Apr 2019
End-to-End Video Captioning
End-to-End Video Captioning
Silvio Olivastri
Gurkirt Singh
Fabio Cuzzolin
150
21
0
04 Apr 2019
Neural Sequential Phrase Grounding (SeqGROUND)
Neural Sequential Phrase Grounding (SeqGROUND)Computer Vision and Pattern Recognition (CVPR), 2019
Pelin Dogan
Leonid Sigal
Markus Gross
ObjD
217
54
0
18 Mar 2019
M-VAD Names: a Dataset for Video Captioning with Naming
M-VAD Names: a Dataset for Video Captioning with NamingMultimedia tools and applications (MTA), 2018
S. Pini
Marcella Cornia
Federico Bolelli
Lorenzo Baraldi
Rita Cucchiara
173
29
0
04 Mar 2019
Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding
  for Video Captioning
Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video CaptioningComputer Vision and Pattern Recognition (CVPR), 2019
Nayyer Aafaq
Naveed Akhtar
Wen Liu
Syed Zulqarnain Gilani
Lin Wang
231
222
0
27 Feb 2019
Audio Caption: Listen and Tell
Audio Caption: Listen and Tell
Mengyue Wu
Heinrich Dinkel
Kai Yu
259
69
0
25 Feb 2019
Read, Watch, and Move: Reinforcement Learning for Temporally Grounding
  Natural Language Descriptions in Videos
Read, Watch, and Move: Reinforcement Learning for Temporally Grounding Natural Language Descriptions in Videos
Dongliang He
Xiang Zhao
Jizhou Huang
Fu Li
Xiao-Chang Liu
Shilei Wen
212
164
0
21 Jan 2019
Action2Vec: A Crossmodal Embedding Approach to Action Learning
Action2Vec: A Crossmodal Embedding Approach to Action Learning
Meera Hahn
Andrew Silva
James M. Rehg
196
59
0
02 Jan 2019
Not All Words are Equal: Video-specific Information Loss for Video
  Captioning
Not All Words are Equal: Video-specific Information Loss for Video Captioning
Jiarong Dong
Ke Gao
Xiaokai Chen
Junbo Guo
Juan Cao
Yongdong Zhang
134
8
0
01 Jan 2019
Hierarchical LSTMs with Adaptive Attention for Visual Captioning
Hierarchical LSTMs with Adaptive Attention for Visual Captioning
Jingkuan Song
Xiangpeng Li
Lianli Gao
Heng Tao Shen
169
231
0
26 Dec 2018
Middle-Out Decoding
Middle-Out Decoding
Shikib Mehri
Leonid Sigal
168
22
0
28 Oct 2018
Exploring Visual Relationship for Image Captioning
Exploring Visual Relationship for Image Captioning
Ting Yao
Yingwei Pan
Yehao Li
Tao Mei
400
897
0
19 Sep 2018
The ActivityNet Large-Scale Activity Recognition Challenge 2018 Summary
The ActivityNet Large-Scale Activity Recognition Challenge 2018 Summary
Guohao Li
Juan Carlos Niebles
Cees G. M. Snoek
Fabian Caba Heilbron
Humam Alwassel
Victor Escorcia
Ranjay Krishna
S. Buch
Cuong Duc Dao
238
66
0
11 Aug 2018
Video Captioning with Boundary-aware Hierarchical Language Decoding and
  Joint Video Prediction
Video Captioning with Boundary-aware Hierarchical Language Decoding and Joint Video Prediction
Xiangxi Shi
Jianfei Cai
Jiuxiang Gu
Shafiq Joty
123
19
0
08 Jul 2018
YH Technologies at ActivityNet Challenge 2018
YH Technologies at ActivityNet Challenge 2018
Ting Yao
Xue Li
110
11
0
29 Jun 2018
Best Vision Technologies Submission to ActivityNet Challenge 2018-Task:
  Dense-Captioning Events in Videos
Best Vision Technologies Submission to ActivityNet Challenge 2018-Task: Dense-Captioning Events in Videos
Yuan Liu
Moyini Yao
105
1
0
25 Jun 2018
Video Description: A Survey of Methods, Datasets and Evaluation Metrics
Video Description: A Survey of Methods, Datasets and Evaluation Metrics
Nayyer Aafaq
Lin Wang
Wen Liu
Syed Zulqarnain Gilani
Mubarak Shah
486
101
0
01 Jun 2018
Hierarchically Structured Reinforcement Learning for Topically Coherent
  Visual Story Generation
Hierarchically Structured Reinforcement Learning for Topically Coherent Visual Story Generation
Qiuyuan Huang
Zhe Gan
Asli Celikyilmaz
D. Wu
Jianfeng Wang
Xiaodong He
BDL
246
96
0
21 May 2018
Memory Matching Networks for One-Shot Image Recognition
Memory Matching Networks for One-Shot Image Recognition
Qi Cai
Yingwei Pan
Ting Yao
C. Yan
Tao Mei
VLM
228
285
0
23 Apr 2018
Jointly Localizing and Describing Events for Dense Video Captioning
Jointly Localizing and Describing Events for Dense Video Captioning
Yehao Li
Ting Yao
Yingwei Pan
Hongyang Chao
Tao Mei
172
186
0
23 Apr 2018
To Create What You Tell: Generating Videos from Captions
To Create What You Tell: Generating Videos from CaptionsACM Multimedia (ACM MM), 2017
Yingwei Pan
Zhaofan Qiu
Ting Yao
Houqiang Li
Tao Mei
GAN
226
167
0
23 Apr 2018
To Find Where You Talk: Temporal Sentence Localization in Video with
  Attention Based Location Regression
To Find Where You Talk: Temporal Sentence Localization in Video with Attention Based Location RegressionAAAI Conference on Artificial Intelligence (AAAI), 2018
Yitian Yuan
Tao Mei
Wenwu Zhu
316
358
0
19 Apr 2018
Learning a Text-Video Embedding from Incomplete and Heterogeneous Data
Learning a Text-Video Embedding from Incomplete and Heterogeneous Data
Antoine Miech
Ivan Laptev
Josef Sivic
339
244
0
07 Apr 2018
Bidirectional Attentive Fusion with Context Gating for Dense Video
  Captioning
Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning
Jingwen Wang
Wenhao Jiang
Lin Ma
Wen Liu
Yong-mei Xu
300
226
0
31 Mar 2018
Reconstruction Network for Video Captioning
Reconstruction Network for Video Captioning
Bairui Wang
Lin Ma
Wei Zhang
Wen Liu
220
340
0
30 Mar 2018
Less Is More: Picking Informative Frames for Video Captioning
Less Is More: Picking Informative Frames for Video Captioning
Yangyu Chen
Shuhui Wang
Feiyu Xiong
Qingming Huang
168
207
0
05 Mar 2018
A Neural Multi-sequence Alignment TeCHnique (NeuMATCH)
A Neural Multi-sequence Alignment TeCHnique (NeuMATCH)
Pelin Dogan
Boyang Albert Li
Leonid Sigal
Markus Gross
AI4TS
245
24
0
19 Feb 2018
Learning Video-Story Composition via Recurrent Neural Network
Learning Video-Story Composition via Recurrent Neural Network
Guangyu Zhong
Yi-Hsuan Tsai
Sifei Liu
Zhixun Su
Ming-Hsuan Yang
68
7
0
31 Jan 2018
Video-based Sign Language Recognition without Temporal Segmentation
Video-based Sign Language Recognition without Temporal Segmentation
Jie Huang
Wen-gang Zhou
Qilin Zhang
Houqiang Li
Weiping Li
SLR
265
450
0
30 Jan 2018
Learning Semantic Concepts and Order for Image and Sentence Matching
Learning Semantic Concepts and Order for Image and Sentence Matching
Yan Huang
Qi Wu
Liang Wang
VLM
205
322
0
06 Dec 2017
A Closer Look at Spatiotemporal Convolutions for Action Recognition
A Closer Look at Spatiotemporal Convolutions for Action Recognition
Du Tran
Heng Wang
Lorenzo Torresani
Jamie Ray
Yann LeCun
Manohar Paluri
530
3,417
0
30 Nov 2017
Video Captioning via Hierarchical Reinforcement Learning
Video Captioning via Hierarchical Reinforcement Learning
Xin Eric Wang
Wenhu Chen
Jiawei Wu
Yuan-fang Wang
William Yang Wang
214
250
0
29 Nov 2017
HP-GAN: Probabilistic 3D human motion prediction via GAN
HP-GAN: Probabilistic 3D human motion prediction via GAN
Emad Barsoum
J. Kender
Zicheng Liu
3DH
247
359
0
27 Nov 2017
Integrating both Visual and Audio Cues for Enhanced Video Caption
Wangli Hao
Zhaoxiang Zhang
He Guan
Guibo Zhu
173
37
0
22 Nov 2017
Functional Map of the World
Functional Map of the World
Gordon A. Christie
Neil Fendley
James Wilson
R. Mukherjee
VGen
341
477
0
21 Nov 2017
Grounded Objects and Interactions for Video Captioning
Grounded Objects and Interactions for Video Captioning
Chih-Yao Ma
Asim Kadav
I. Melvin
Z. Kira
G. Al-Regib
H. Graf
127
6
0
16 Nov 2017
Attend and Interact: Higher-Order Object Interactions for Video
  Understanding
Attend and Interact: Higher-Order Object Interactions for Video Understanding
Chih-Yao Ma
Asim Kadav
I. Melvin
Z. Kira
G. Al-Regib
H. Graf
185
149
0
16 Nov 2017
ActivityNet Challenge 2017 Summary
ActivityNet Challenge 2017 Summary
Guohao Li
Juan Carlos Niebles
Cees G. M. Snoek
Fabian Caba Heilbron
Humam Alwassel
Ranjay Krishna
Victor Escorcia
Kenji Hata
S. Buch
186
50
0
22 Oct 2017
Anticipating Daily Intention using On-Wrist Motion Triggered Sensing
Anticipating Daily Intention using On-Wrist Motion Triggered Sensing
Tz-Ying Wu
Ting-An Chien
C. Chan
Chan-Wei Hu
Min Sun
143
21
0
20 Oct 2017
Predicting Visual Features from Text for Image and Video Caption
  Retrieval
Predicting Visual Features from Text for Image and Video Caption Retrieval
Jianfeng Dong
Xirong Li
Cees G. M. Snoek
236
238
0
05 Sep 2017
Previous
1234
Next
Page 3 of 4