ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1505.01861
  4. Cited By
Jointly Modeling Embedding and Translation to Bridge Video and Language
v1v2v3 (latest)

Jointly Modeling Embedding and Translation to Bridge Video and Language

7 May 2015
Yingwei Pan
Tao Mei
Ting Yao
Houqiang Li
Y. Rui
ArXiv (abs)PDFHTML

Papers citing "Jointly Modeling Embedding and Translation to Bridge Video and Language"

50 / 199 papers shown
Title
3D Human motion anticipation and classification
3D Human motion anticipation and classification
Emad Barsoum
J. Kender
Zicheng Liu
3DH
117
2
0
31 Dec 2020
Guidance Module Network for Video Captioning
Guidance Module Network for Video CaptioningCybersecurity and Cyberforensics Conference (CC), 2020
Xiao Zhang
Chunsheng Liu
F. Chang
91
4
0
20 Dec 2020
A Comprehensive Review on Recent Methods and Challenges of Video
  Description
A Comprehensive Review on Recent Methods and Challenges of Video Description
Ashutosh Kumar Singh
Thoudam Doren Singh
Sivaji Bandyopadhyay
3DVVLM
185
5
0
30 Nov 2020
SEA: Sentence Encoder Assembly for Video Retrieval by Textual Queries
SEA: Sentence Encoder Assembly for Video Retrieval by Textual QueriesIEEE transactions on multimedia (TMM), 2020
Xirong Li
Fangming Zhou
Chaoxi Xu
Jiaqi Ji
Gang Yang
175
61
0
24 Nov 2020
Neuro-Symbolic Representations for Video Captioning: A Case for
  Leveraging Inductive Biases for Vision and Language
Neuro-Symbolic Representations for Video Captioning: A Case for Leveraging Inductive Biases for Vision and Language
Hassan Akbari
Hamid Palangi
Jianwei Yang
Sudha Rao
Asli Celikyilmaz
Roland Fernandez
P. Smolensky
Jianfeng Gao
Shih-Fu Chang
169
3
0
18 Nov 2020
A Hierarchical Multi-Modal Encoder for Moment Localization in Video
  Corpus
A Hierarchical Multi-Modal Encoder for Moment Localization in Video Corpus
Bowen Zhang
Hexiang Hu
Joonseok Lee
Mingde Zhao
Sheide Chammas
Vihan Jain
Eugene Ie
Fei Sha
175
39
0
18 Nov 2020
ActBERT: Learning Global-Local Video-Text Representations
ActBERT: Learning Global-Local Video-Text RepresentationsComputer Vision and Pattern Recognition (CVPR), 2020
Linchao Zhu
Yi Yang
ViT
270
450
0
14 Nov 2020
COOT: Cooperative Hierarchical Transformer for Video-Text Representation
  Learning
COOT: Cooperative Hierarchical Transformer for Video-Text Representation LearningNeural Information Processing Systems (NeurIPS), 2020
Simon Ging
Mohammadreza Zolfaghari
Hamed Pirsiavash
Thomas Brox
ViTCLIP
180
178
0
01 Nov 2020
Deep Analysis of CNN-based Spatio-temporal Representations for Action
  Recognition
Deep Analysis of CNN-based Spatio-temporal Representations for Action Recognition
Chun-Fu Chen
Yikang Shen
K. Ramakrishnan
Rogerio Feris
J. M. Cohn
A. Oliva
Quanfu Fan
253
116
0
22 Oct 2020
Video captioning with stacked attention and semantic hard pull
Video captioning with stacked attention and semantic hard pullPeerJ Computer Science (PeerJ Comput. Sci.), 2020
Md. Mushfiqur Rahman
Thasinul Abedin
Khondokar S. S. Prottoy
Ayana Moshruba
Fazlul Hasan Siddiqui
185
2
0
15 Sep 2020
Text-based Localization of Moments in a Video Corpus
Text-based Localization of Moments in a Video Corpus
Sudipta Paul
Niluthpol Chowdhury Mithun
Amit K. Roy-Chowdhury
158
21
0
20 Aug 2020
The VISIONE Video Search System: Exploiting Off-the-Shelf Text Search
  Engines for Large-Scale Video Retrieval
The VISIONE Video Search System: Exploiting Off-the-Shelf Text Search Engines for Large-Scale Video RetrievalJournal of Imaging (JI), 2020
Giuseppe Amato
Paolo Bolettieri
F. Carrara
Franca Debole
Fabrizio Falchi
Claudio Gennaro
Lucia Vadicamo
Claudio Vairo
265
20
0
06 Aug 2020
Graph Wasserstein Correlation Analysis for Movie Retrieval
Graph Wasserstein Correlation Analysis for Movie RetrievalEuropean Conference on Computer Vision (ECCV), 2020
Xueyao Zhang
Tong Zhang
Xiaobin Hong
Zhen Cui
Zhiqiang Wang
91
2
0
06 Aug 2020
Enriching Video Captions With Contextual Text
Enriching Video Captions With Contextual TextInternational Conference on Pattern Recognition (ICPR), 2020
Philipp Rimle
Pelin Dogan
Markus Gross
145
3
0
29 Jul 2020
Latent Unexpected Recommendations
Latent Unexpected RecommendationsACM Transactions on Intelligent Systems and Technology (ACM TIST), 2020
Pan Li
Alexander Tuzhilin
60
15
0
27 Jul 2020
Fully Convolutional Networks for Continuous Sign Language Recognition
Fully Convolutional Networks for Continuous Sign Language RecognitionEuropean Conference on Computer Vision (ECCV), 2020
Ka Leong Cheng
Zhaoyang Yang
Qifeng Chen
Yu-Wing Tai
SLR
211
185
0
24 Jul 2020
Knowledge Graph Extraction from Videos
Knowledge Graph Extraction from Videos
Louis Mahon
Eleonora Giunchiglia
Bowen Li
Thomas Lukasiewicz
101
21
0
20 Jul 2020
Knowledge-Based Video Question Answering with Unsupervised Scene
  Descriptions
Knowledge-Based Video Question Answering with Unsupervised Scene DescriptionsEuropean Conference on Computer Vision (ECCV), 2020
Noa Garcia
Yuta Nakashima
224
35
0
17 Jul 2020
COBE: Contextualized Object Embeddings from Narrated Instructional Video
COBE: Contextualized Object Embeddings from Narrated Instructional VideoNeural Information Processing Systems (NeurIPS), 2020
Gedas Bertasius
Lorenzo Torresani
185
26
0
14 Jul 2020
Single Shot Video Object Detector
Single Shot Video Object Detector
Jiajun Deng
Yingwei Pan
Ting Yao
Wen-gang Zhou
Houqiang Li
Tao Mei
ObjD
152
47
0
07 Jul 2020
VPN: Learning Video-Pose Embedding for Activities of Daily Living
VPN: Learning Video-Pose Embedding for Activities of Daily Living
Srijan Das
Saurav Sharma
Rui Dai
Francois Bremond
Monique Thonnat
ViT
274
154
0
06 Jul 2020
Auto-captions on GIF: A Large-scale Video-sentence Dataset for
  Vision-language Pre-training
Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training
Yingwei Pan
Yehao Li
Jianjie Luo
Jun Xu
Ting Yao
Tao Mei
179
61
0
05 Jul 2020
Self-Supervised MultiModal Versatile Networks
Self-Supervised MultiModal Versatile Networks
Jean-Baptiste Alayrac
Adrià Recasens
R. Schneider
Relja Arandjelović
Jason Ramapuram
J. Fauw
Lucas Smaira
Sander Dieleman
Andrew Zisserman
SSL
369
396
0
29 Jun 2020
Transcription-Enriched Joint Embeddings for Spoken Descriptions of
  Images and Videos
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and Videos
Benet Oriol
Jordi Luque
Ferran Diego
Xavier Giró-i-Nieto
69
0
0
01 Jun 2020
Rethinking and Improving Natural Language Generation with Layer-Wise
  Multi-View Decoding
Rethinking and Improving Natural Language Generation with Layer-Wise Multi-View Decoding
Fenglin Liu
Xuancheng Ren
Guangxiang Zhao
Chenyu You
Xuewei Ma
Xian Wu
Xu Sun
402
2
0
16 May 2020
Text Synopsis Generation for Egocentric Videos
Text Synopsis Generation for Egocentric Videos
Aidean Sharghi
N. Lobo
M. Shah
DiffMEgoV
148
1
0
08 May 2020
HERO: Hierarchical Encoder for Video+Language Omni-representation
  Pre-training
HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-trainingConference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Linjie Li
Yen-Chun Chen
Yu Cheng
Zhe Gan
Licheng Yu
Jingjing Liu
MLLMVLMOffRLAI4TS
605
536
0
01 May 2020
Feature Re-Learning with Data Augmentation for Video Relevance
  Prediction
Feature Re-Learning with Data Augmentation for Video Relevance PredictionIEEE Transactions on Knowledge and Data Engineering (TKDE), 2020
Jianfeng Dong
Xun Wang
Leimin Zhang
Chaoxi Xu
Gang Yang
Xirong Li
138
14
0
08 Apr 2020
Straight to the Point: Fast-forwarding Videos via Reinforcement Learning
  Using Textual Data
Straight to the Point: Fast-forwarding Videos via Reinforcement Learning Using Textual DataComputer Vision and Pattern Recognition (CVPR), 2020
W. Ramos
M. Silva
Edson R. Araujo
Leandro Soriano Marcolino
Erickson R. Nascimento
VGen
97
6
0
31 Mar 2020
Predicting the Popularity of Micro-videos with Multimodal Variational
  Encoder-Decoder Framework
Predicting the Popularity of Micro-videos with Multimodal Variational Encoder-Decoder FrameworkIEEE transactions on multimedia (TMM), 2020
Yaochen Zhu
Jiayi Xie
Zhenzhong Chen
88
33
0
28 Mar 2020
Noise Estimation Using Density Estimation for Self-Supervised Multimodal
  Learning
Noise Estimation Using Density Estimation for Self-Supervised Multimodal LearningAAAI Conference on Artificial Intelligence (AAAI), 2020
Elad Amrani
Rami Ben-Ari
Daniel Rotman
A. Bronstein
316
129
0
06 Mar 2020
Hierarchical Memory Decoding for Video Captioning
Hierarchical Memory Decoding for Video Captioning
Aming Wu
Yahong Han
131
2
0
27 Feb 2020
Object Relational Graph with Teacher-Recommended Learning for Video
  Captioning
Object Relational Graph with Teacher-Recommended Learning for Video CaptioningComputer Vision and Pattern Recognition (CVPR), 2020
Ziqi Zhang
Yaya Shi
Chunfen Yuan
Bing Li
Peijin Wang
Weiming Hu
Zhengjun Zha
VLM
188
304
0
26 Feb 2020
Learning Spatiotemporal Features via Video and Text Pair Discrimination
Learning Spatiotemporal Features via Video and Text Pair Discrimination
Tianhao Li
Limin Wang
VGen
132
60
0
16 Jan 2020
Vision and Language: from Visual Perception to Content Creation
Vision and Language: from Visual Perception to Content CreationAPSIPA Transactions on Signal and Information Processing (APSIPA TSIP), 2019
Tao Mei
Wei Zhang
Ting Yao
VLM
170
8
0
26 Dec 2019
End-to-End Learning of Visual Representations from Uncurated
  Instructional Videos
End-to-End Learning of Visual Representations from Uncurated Instructional VideosComputer Vision and Pattern Recognition (CVPR), 2019
Antoine Miech
Jean-Baptiste Alayrac
Lucas Smaira
Ivan Laptev
Josef Sivic
Andrew Zisserman
VGenSSL
458
754
0
13 Dec 2019
Transform-Invariant Convolutional Neural Networks for Image
  Classification and Search
Transform-Invariant Convolutional Neural Networks for Image Classification and SearchACM Multimedia (MM), 2016
Xu Shen
Xinmei Tian
Anfeng He
Shaoyan Sun
Dacheng Tao
OOD
114
44
0
28 Nov 2019
Patch Reordering: a Novel Way to Achieve Rotation and Translation
  Invariance in Convolutional Neural Networks
Patch Reordering: a Novel Way to Achieve Rotation and Translation Invariance in Convolutional Neural NetworksAAAI Conference on Artificial Intelligence (AAAI), 2017
Xu Shen
Xinmei Tian
Shaoyan Sun
Dacheng Tao
101
7
0
28 Nov 2019
Empirical Autopsy of Deep Video Captioning Frameworks
Empirical Autopsy of Deep Video Captioning Frameworks
Nayyer Aafaq
Naveed Akhtar
Wei Liu
Lin Wang
115
6
0
21 Nov 2019
Multimodal Intelligence: Representation Learning, Information Fusion,
  and Applications
Multimodal Intelligence: Representation Learning, Information Fusion, and ApplicationsIEEE Journal on Selected Topics in Signal Processing (JSTSP), 2019
Chao Zhang
Zichao Yang
Xiaodong He
Li Deng
HAIAI4TS
279
396
0
10 Nov 2019
Video Captioning with Text-based Dynamic Attention and Step-by-Step
  Learning
Video Captioning with Text-based Dynamic Attention and Step-by-Step LearningPattern Recognition Letters (PR), 2019
Huanhou Xiao
Jinglun Shi
109
27
0
05 Nov 2019
Diverse Video Captioning Through Latent Variable Expansion
Diverse Video Captioning Through Latent Variable ExpansionPattern Recognition Letters (PR), 2019
Huanhou Xiao
Jinglun Shi
DiffM
282
15
0
26 Oct 2019
Dynamic Joint Variational Graph Autoencoders
Dynamic Joint Variational Graph Autoencoders
Sedigheh Mahdavi
Shima Khoshraftar
Aijun An
BDL
91
24
0
04 Oct 2019
Translation, Sentiment and Voices: A Computational Model to Translate
  and Analyse Voices from Real-Time Video Calling
Translation, Sentiment and Voices: A Computational Model to Translate and Analyse Voices from Real-Time Video Calling
A. Roy
79
1
0
28 Sep 2019
A Semantics-Assisted Video Captioning Model Trained with Scheduled
  Sampling
A Semantics-Assisted Video Captioning Model Trained with Scheduled SamplingFrontiers in Robotics and AI (Front. Robot. AI), 2019
Haoran Chen
Ke Lin
A. Maye
Jianmin Li
Xiaoling Hu
153
49
0
31 Aug 2019
Controllable Video Captioning with POS Sequence Guidance Based on Gated
  Fusion Network
Controllable Video Captioning with POS Sequence Guidance Based on Gated Fusion NetworkIEEE International Conference on Computer Vision (ICCV), 2019
Bairui Wang
Lin Ma
Wei Zhang
Wenhao Jiang
Jingwen Wang
Wei Liu
206
176
0
27 Aug 2019
Mocycle-GAN: Unpaired Video-to-Video Translation
Mocycle-GAN: Unpaired Video-to-Video TranslationACM Multimedia (ACM MM), 2019
Yang Chen
Yingwei Pan
Ting Yao
Xinmei Tian
Tao Mei
GAN
135
94
0
26 Aug 2019
Relation Distillation Networks for Video Object Detection
Relation Distillation Networks for Video Object DetectionIEEE International Conference on Computer Vision (ICCV), 2019
Jiajun Deng
Yingwei Pan
Ting Yao
Wen-gang Zhou
Houqiang Li
Tao Mei
ObjD
248
201
0
26 Aug 2019
Towards Unsupervised Image Captioning with Shared Multimodal Embeddings
Towards Unsupervised Image Captioning with Shared Multimodal EmbeddingsIEEE International Conference on Computer Vision (ICCV), 2019
Iro Laina
Christian Rupprecht
Nassir Navab
SSL
154
112
0
25 Aug 2019
SF-Net: Structured Feature Network for Continuous Sign Language
  Recognition
SF-Net: Structured Feature Network for Continuous Sign Language Recognition
Zhaoyang Yang
Zhenmei Shi
Xiaoyong Shen
Yu-Wing Tai
SLR
111
71
0
04 Aug 2019
Previous
1234
Next