Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1505.01861
Cited By
Jointly Modeling Embedding and Translation to Bridge Video and Language
7 May 2015
Yingwei Pan
Tao Mei
Ting Yao
Houqiang Li
Y. Rui
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Jointly Modeling Embedding and Translation to Bridge Video and Language"
50 / 52 papers shown
Title
MM-NeRF: Multimodal-Guided 3D Multi-Style Transfer of Neural Radiance Field
Zijian Győző Yang
Zhongwei Qiu
Chang Xu
Dongmei Fu
50
2
0
28 Jan 2025
Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning
Jianjie Luo
Jingwen Chen
Yehao Li
Yingwei Pan
Jianlin Feng
Hongyang Chao
Ting Yao
DiffM
VLM
45
0
0
03 Jan 2025
Hierarchical Banzhaf Interaction for General Video-Language Representation Learning
Peng Jin
H. Li
Li Yuan
Shuicheng Yan
Jie Chen
45
1
0
31 Dec 2024
Multi Sentence Description of Complex Manipulation Action Videos
Fatemeh Ziaeetabar
Reza Safabakhsh
S. Momtazi
M. Tamosiunaite
F. Worgotter
23
1
0
13 Nov 2023
ADAPT: Action-aware Driving Caption Transformer
Bu Jin
Xinyi Liu
Yupeng Zheng
Pengfei Li
Hao Zhao
Tong Zhang
Yuhang Zheng
Guyue Zhou
Jingjing Liu
23
69
0
01 Feb 2023
Prophet Attention: Predicting Attention with Future Attention for Image Captioning
Fenglin Liu
Xuancheng Ren
Xian Wu
Wei Fan
Yuexian Zou
Xu Sun
19
46
0
19 Oct 2022
Cross Modal Compression: Towards Human-comprehensible Semantic Compression
Jiguo Li
Chuanmin Jia
Xinfeng Zhang
Siwei Ma
Wen Gao
9
18
0
06 Sep 2022
Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos
Tomávs Souvcek
Jean-Baptiste Alayrac
Antoine Miech
Ivan Laptev
Josef Sivic
19
32
0
22 Mar 2022
Syntax Customized Video Captioning by Imitating Exemplar Sentences
Yitian Yuan
Lin Ma
Wenwu Zhu
16
6
0
02 Dec 2021
CLIP4Caption: CLIP for Video Caption
Mingkang Tang
Zhanyu Wang
Zhenhua Liu
Fengyun Rao
Dian Li
Xiu Li
CLIP
VLM
27
149
0
13 Oct 2021
A Survey on Temporal Sentence Grounding in Videos
Xiaohan Lan
Yitian Yuan
Xin Eric Wang
Zhi Wang
Wenwu Zhu
25
47
0
16 Sep 2021
Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal Attention
Katsuyuki Nakamura
Hiroki Ohashi
Mitsuhiro Okada
EgoV
31
12
0
07 Sep 2021
X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics
Yehao Li
Yingwei Pan
Jingwen Chen
Ting Yao
Tao Mei
VLM
13
31
0
18 Aug 2021
End-to-End Dense Video Captioning with Parallel Decoding
Teng Wang
Ruimao Zhang
Zhichao Lu
Feng Zheng
Ran Cheng
Ping Luo
3DV
30
179
0
17 Aug 2021
ActBERT: Learning Global-Local Video-Text Representations
Linchao Zhu
Yi Yang
ViT
14
417
0
14 Nov 2020
COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning
Simon Ging
Mohammadreza Zolfaghari
Hamed Pirsiavash
Thomas Brox
ViT
CLIP
13
168
0
01 Nov 2020
Deep Analysis of CNN-based Spatio-temporal Representations for Action Recognition
Chun-Fu Chen
Rameswar Panda
K. Ramakrishnan
Rogerio Feris
J. M. Cohn
A. Oliva
Quanfu Fan
21
95
0
22 Oct 2020
The VISIONE Video Search System: Exploiting Off-the-Shelf Text Search Engines for Large-Scale Video Retrieval
Giuseppe Amato
Paolo Bolettieri
F. Carrara
Franca Debole
Fabrizio Falchi
Claudio Gennaro
Lucia Vadicamo
Claudio Vairo
8
17
0
06 Aug 2020
Enriching Video Captions With Contextual Text
Philipp Rimle
Pelin Dogan
Markus Gross
9
3
0
29 Jul 2020
Fully Convolutional Networks for Continuous Sign Language Recognition
Ka Leong Cheng
Zhaoyang Yang
Qifeng Chen
Yu-Wing Tai
SLR
21
143
0
24 Jul 2020
Self-Supervised MultiModal Versatile Networks
Jean-Baptiste Alayrac
Adrià Recasens
R. Schneider
Relja Arandjelović
Jason Ramapuram
J. Fauw
Lucas Smaira
Sander Dieleman
Andrew Zisserman
SSL
40
371
0
29 Jun 2020
HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training
Linjie Li
Yen-Chun Chen
Yu Cheng
Zhe Gan
Licheng Yu
Jingjing Liu
MLLM
VLM
OffRL
AI4TS
41
491
0
01 May 2020
Object Relational Graph with Teacher-Recommended Learning for Video Captioning
Ziqi Zhang
Yaya Shi
Chunfen Yuan
Bing Li
Peijin Wang
Weiming Hu
Zhengjun Zha
VLM
16
271
0
26 Feb 2020
SF-Net: Structured Feature Network for Continuous Sign Language Recognition
Zhaoyang Yang
Zhenmei Shi
Xiaoyong Shen
Yu-Wing Tai
SLR
25
63
0
04 Aug 2019
Use What You Have: Video Retrieval Using Representations From Collaborative Experts
Yang Liu
Samuel Albanie
Arsha Nagrani
Andrew Zisserman
13
386
0
31 Jul 2019
Language2Pose: Natural Language Grounded Pose Forecasting
Chaitanya Ahuja
Louis-Philippe Morency
17
264
0
02 Jul 2019
Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning
Jingwen Chen
Yingwei Pan
Yehao Li
Ting Yao
Hongyang Chao
Tao Mei
9
103
0
03 May 2019
Pointing Novel Objects in Image Captioning
Yehao Li
Ting Yao
Yingwei Pan
Hongyang Chao
Tao Mei
22
69
0
25 Apr 2019
Weakly Supervised Video Moment Retrieval From Text Queries
Niluthpol Chowdhury Mithun
S. Paul
A. Roy-Chowdhury
14
192
0
05 Apr 2019
End-to-End Video Captioning
Silvio Olivastri
Gurkirt Singh
Fabio Cuzzolin
11
18
0
04 Apr 2019
Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning
Nayyer Aafaq
Naveed Akhtar
W. Liu
Syed Zulqarnain Gilani
Ajmal Saeed Mian
18
203
0
27 Feb 2019
Audio Caption: Listen and Tell
Mengyue Wu
Heinrich Dinkel
Kai Yu
14
61
0
25 Feb 2019
The ActivityNet Large-Scale Activity Recognition Challenge 2018 Summary
Bernard Ghanem
Juan Carlos Niebles
Cees G. M. Snoek
Fabian Caba Heilbron
Humam Alwassel
Victor Escorcia
Ranjay Krishna
S. Buch
Cuong Duc Dao
34
65
0
11 Aug 2018
Reconstruction Network for Video Captioning
Bairui Wang
Lin Ma
Wei Zhang
W. Liu
8
316
0
30 Mar 2018
Learning Video-Story Composition via Recurrent Neural Network
Guangyu Zhong
Yi-Hsuan Tsai
Sifei Liu
Zhixun Su
Ming-Hsuan Yang
19
7
0
31 Jan 2018
HP-GAN: Probabilistic 3D human motion prediction via GAN
Emad Barsoum
J. Kender
Zicheng Liu
3DH
36
329
0
27 Nov 2017
PassGAN: A Deep Learning Approach for Password Guessing
B. Hitaj
Paolo Gasti
G. Ateniese
F. Pérez-Cruz
GAN
17
246
0
01 Sep 2017
ConvNet Architecture Search for Spatiotemporal Feature Learning
Du Tran
Jamie Ray
Zheng Shou
Shih-Fu Chang
Manohar Paluri
3DPC
23
381
0
16 Aug 2017
Reinforced Video Captioning with Entailment Rewards
Ramakanth Pasunuru
Mohit Bansal
15
114
0
07 Aug 2017
Dense-Captioning Events in Videos
Ranjay Krishna
Kenji Hata
F. Ren
Li Fei-Fei
Juan Carlos Niebles
48
1,213
0
02 May 2017
Multi-Task Video Captioning with Video and Entailment Generation
Ramakanth Pasunuru
Mohit Bansal
17
115
0
24 Apr 2017
TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition
Chih-Yao Ma
Min-Hung Chen
Z. Kira
G. Al-Regib
AI4TS
22
241
0
30 Mar 2017
Adaptive Feature Abstraction for Translating Video to Text
Yunchen Pu
Martin Renqiang Min
Zhe Gan
Lawrence Carin
24
14
0
23 Nov 2016
Dense Captioning with Joint Inference and Visual Context
L. Yang
K. Tang
Jianchao Yang
Li-Jia Li
VLM
19
169
0
21 Nov 2016
Recurrent Memory Addressing for describing videos
A. Jain
Abhinav Agarwalla
Kumar Krishna Agrawal
Pabitra Mitra
22
10
0
20 Nov 2016
Learning long-term dependencies for action recognition with a biologically-inspired deep network
Yemin Shi
Yonghong Tian
Yaowei Wang
Tiejun Huang
21
62
0
16 Nov 2016
Boosting Image Captioning with Attributes
Ting Yao
Yingwei Pan
Yehao Li
Zhaofan Qiu
Tao Mei
VLM
11
620
0
05 Nov 2016
Learning Spatial-Semantic Context with Fully Convolutional Recurrent Network for Online Handwritten Chinese Text Recognition
Zecheng Xie
Zenghui Sun
Lianwen Jin
Hao Ni
Terry Lyons
25
122
0
09 Oct 2016
Title Generation for User Generated Videos
Kuo-Hao Zeng
Tseng-Hung Chen
Juan Carlos Niebles
Min Sun
21
68
0
25 Aug 2016
Movie Description
Anna Rohrbach
Atousa Torabi
Marcus Rohrbach
Niket Tandon
C. Pal
Hugo Larochelle
Aaron Courville
Bernt Schiele
3DV
VGen
22
353
0
12 May 2016
1
2
Next