Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1904.01766
Cited By
v1
v2 (latest)
VideoBERT: A Joint Model for Video and Language Representation Learning
3 April 2019
Chen Sun
Austin Myers
Carl Vondrick
Kevin Patrick Murphy
Cordelia Schmid
VLM
SSL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"VideoBERT: A Joint Model for Video and Language Representation Learning"
50 / 803 papers shown
Video ChatCaptioner: Towards Enriched Spatiotemporal Descriptions
Jun Chen
Deyao Zhu
Kilichbek Haydarov
Xiang Li
Mohamed Elhoseiny
264
44
0
09 Apr 2023
Scalable and Accurate Self-supervised Multimodal Representation Learning without Aligned Video and Text Data
Vladislav Lialin
Stephen Rawls
David M. Chan
Shalini Ghosh
Anna Rumshisky
Wael Hamza
VLM
AI4TS
267
8
0
04 Apr 2023
Beyond Unimodal: Generalising Neural Processes for Multimodal Uncertainty Estimation
Neural Information Processing Systems (NeurIPS), 2023
M. Jung
He Zhao
Joanna Dipnall
Lan Du
UQCV
BDL
259
11
0
04 Apr 2023
Unbiased Scene Graph Generation in Videos
Computer Vision and Pattern Recognition (CVPR), 2023
Sayak Nag
Kyle Min
Subarna Tripathi
Amit K. Roy-Chowdhury
428
40
0
03 Apr 2023
Procedure-Aware Pretraining for Instructional Video Understanding
Computer Vision and Pattern Recognition (CVPR), 2023
Honglu Zhou
Roberto Martín-Martín
Mubbasir Kapadia
Silvio Savarese
Juan Carlos Niebles
290
55
0
31 Mar 2023
Self-Supervised Multimodal Learning: A Survey
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Yongshuo Zong
Oisin Mac Aodha
Timothy M. Hospedales
SSL
319
89
0
31 Mar 2023
Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations
Computer Vision and Pattern Recognition (CVPR), 2023
Yiwu Zhong
Licheng Yu
Yang Bai
Shangwen Li
Xueting Yan
Yin Li
AI4TS
236
46
0
31 Mar 2023
Dual Cross-Attention for Medical Image Segmentation
Engineering applications of artificial intelligence (Eng. Appl. Artif. Intell.), 2023
Gorkem Can Ates
P. Mohan
Emrah Çelik
164
137
0
30 Mar 2023
Object Discovery from Motion-Guided Tokens
Computer Vision and Pattern Recognition (CVPR), 2023
Zhipeng Bao
P. Tokmakov
Yu-Xiong Wang
Adrien Gaidon
M. Hebert
OCL
204
28
0
27 Mar 2023
RGBT Tracking via Progressive Fusion Transformer with Dynamically Guided Learning
Yabin Zhu
Chenglong Li
Tianlin Li
Jin Tang
Zhixiang Huang
219
15
0
26 Mar 2023
Selective Structured State-Spaces for Long-Form Video Understanding
Computer Vision and Pattern Recognition (CVPR), 2023
Jue Wang
Wenjie Zhu
Pichao Wang
Xiang Yu
Linda Liu
Mohamed Omar
Raffay Hamid
208
159
0
25 Mar 2023
Task-Attentive Transformer Architecture for Continual Learning of Vision-and-Language Tasks Using Knowledge Distillation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yuliang Cai
Jesse Thomason
Mohammad Rostami
VLM
CLL
193
12
0
25 Mar 2023
Learning and Verification of Task Structure in Instructional Videos
Medhini Narasimhan
Licheng Yu
Sean Bell
Ning Zhang
Trevor Darrell
254
24
0
23 Mar 2023
MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models
Computer Vision and Pattern Recognition (CVPR), 2023
Dohwan Ko
Joon-Young Choi
Hyeong Kyu Choi
Kyoung-Woon On
Byungseok Roh
Hyunwoo J. Kim
226
29
0
23 Mar 2023
CiCo: Domain-Aware Sign Language Retrieval via Cross-Lingual Contrastive Learning
Computer Vision and Pattern Recognition (CVPR), 2023
Yiting Cheng
Fangyun Wei
Jianmin Bao
Dong Chen
Wenqian Zhang
SLR
195
40
0
22 Mar 2023
Text with Knowledge Graph Augmented Transformer for Video Captioning
Computer Vision and Pattern Recognition (CVPR), 2023
Xin Gu
G. Chen
Yufei Wang
Libo Zhang
Tiejian Luo
Longyin Wen
211
73
0
22 Mar 2023
Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos
Computer Vision and Pattern Recognition (CVPR), 2023
Sixun Dong
Huazhang Hu
Dongze Lian
Weixin Luo
Yichen Qian
Shenghua Gao
ViT
AI4TS
274
18
0
22 Mar 2023
VideoXum: Cross-modal Visual and Textural Summarization of Videos
IEEE transactions on multimedia (IEEE TMM), 2023
Jingyang Lin
Hang Hua
Ming Chen
Yikang Li
Jenhao Hsiao
C. Ho
Jiebo Luo
381
50
0
21 Mar 2023
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Muhammad Usama
Junaid Qadir
448
68
0
21 Mar 2023
Retrieving Multimodal Information for Augmented Generation: A Survey
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Ruochen Zhao
Hailin Chen
Weishi Wang
Fangkai Jiao
Do Xuan Long
...
Bosheng Ding
Xiaobao Guo
Minzhi Li
Xingxuan Li
Shafiq Joty
411
127
0
20 Mar 2023
Dual-path Adaptation from Image to Video Transformers
Computer Vision and Pattern Recognition (CVPR), 2023
Jungin Park
Jiyoung Lee
Kwanghoon Sohn
ViT
250
57
0
17 Mar 2023
Aerial Diffusion: Text Guided Ground-to-Aerial View Translation from a Single Image using Diffusion Models
D. Kothandaraman
Wanrong Zhu
Ming Lin
Dinesh Manocha
229
6
0
15 Mar 2023
Accommodating Audio Modality in CLIP for Multimodal Processing
AAAI Conference on Artificial Intelligence (AAAI), 2023
Ludan Ruan
Anwen Hu
Yuqing Song
Liang Zhang
S. Zheng
Qin Jin
VLM
179
17
0
12 Mar 2023
Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos
Teng Wang
Jinrui Zhang
Feng Zheng
Wenhao Jiang
Ran Cheng
Ping Luo
VLM
250
14
0
11 Mar 2023
TQ-Net: Mixed Contrastive Representation Learning For Heterogeneous Test Questions
He Zhu
Xihua Li
Xuemin Zhao
Yunbo Cao
Shan Yu
147
0
0
09 Mar 2023
Comparing Trajectory and Vision Modalities for Verb Representation
Dylan Ebert
Chen Sun
Ellie Pavlick
92
1
0
08 Mar 2023
Grounded Decoding: Guiding Text Generation with Grounded Models for Embodied Agents
Neural Information Processing Systems (NeurIPS), 2023
Wenlong Huang
Fei Xia
Dhruv Shah
Danny Driess
Andy Zeng
...
Pete Florence
Igor Mordatch
Sergey Levine
Karol Hausman
Brian Ichter
LM&Ro
256
78
0
01 Mar 2023
Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training
Computer Vision and Pattern Recognition (CVPR), 2023
Dezhao Luo
Jiabo Huang
S. Gong
Hailin Jin
Yang Liu
VGen
327
42
0
28 Feb 2023
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Computer Vision and Pattern Recognition (CVPR), 2023
Antoine Yang
Arsha Nagrani
Paul Hongsuck Seo
Antoine Miech
Jordi Pont-Tuset
Ivan Laptev
Josef Sivic
Cordelia Schmid
AI4TS
VLM
497
325
0
27 Feb 2023
Contrastive Video Question Answering via Video Graph Transformer
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Junbin Xiao
Pan Zhou
Angela Yao
Yicong Li
Richang Hong
Shuicheng Yan
Tat-Seng Chua
ViT
248
51
0
27 Feb 2023
Deep Learning for Video-Text Retrieval: a Review
International Journal of Multimedia Information Retrieval (IJMIR), 2023
Cunjuan Zhu
Qi Jia
Wei Chen
Yanming Guo
Yu Liu
226
28
0
24 Feb 2023
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Machine Intelligence Research (MIR), 2023
Tianlin Li
Guangyao Chen
Guangwu Qian
Pengcheng Gao
Xiaoyong Wei
Yaowei Wang
Yonghong Tian
Wen Gao
AI4CE
VLM
467
272
0
20 Feb 2023
STOA-VLP: Spatial-Temporal Modeling of Object and Action for Video-Language Pre-training
AAAI Conference on Artificial Intelligence (AAAI), 2023
Weihong Zhong
Mao Zheng
Duyu Tang
Xuan Luo
Heng Gong
Xiaocheng Feng
Bing Qin
384
9
0
20 Feb 2023
Hyneter: Hybrid Network Transformer for Object Detection
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Dong Chen
Duoqian Miao
Xuepeng Zhao
ViT
193
6
0
18 Feb 2023
Transformadores: Fundamentos teoricos y Aplicaciones
J. D. L. Torre
293
0
0
18 Feb 2023
Multimodal Subtask Graph Generation from Instructional Videos
Y. Jang
Sungryull Sohn
Lajanugen Logeswaran
Tiange Luo
Moontae Lee
Ho Hin Lee
195
14
0
17 Feb 2023
Hierarchical Cross-modal Transformer for RGB-D Salient Object Detection
Hao Chen
Feihong Shen
ViT
108
1
0
16 Feb 2023
Multi-modal Machine Learning in Engineering Design: A Review and Future Directions
Journal of Computing and Information Science in Engineering (JCISE), 2023
Binyang Song
Ruilin Zhou
Faez Ahmed
AI4CE
356
64
0
14 Feb 2023
Large Scale Multi-Lingual Multi-Modal Summarization Dataset
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
Yash Verma
Anubhav Jangra
Raghvendra Kumar
S. Saha
114
22
0
13 Feb 2023
BEST: BERT Pre-Training for Sign Language Recognition with Coupling Tokenization
AAAI Conference on Artificial Intelligence (AAAI), 2023
Weichao Zhao
Hezhen Hu
Wen-gang Zhou
Jiaxin Shi
Houqiang Li
SLR
273
60
0
10 Feb 2023
AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations
Automatic Speech Recognition & Understanding (ASRU), 2023
Jiachen Lian
Alexei Baevski
Wei-Ning Hsu
Michael Auli
SSL
382
43
0
10 Feb 2023
SwinCross: Cross-modal Swin Transformer for Head-and-Neck Tumor Segmentation in PET/CT Images
Medical Physics (Lancaster) (Med. Phys.), 2023
Gary Y. Li
Junyu Chen
Se-In Jang
Kuang Gong
Shijie Zhao
ViT
MedIm
213
21
0
08 Feb 2023
Program Generation from Diverse Video Demonstrations
British Machine Vision Conference (BMVC), 2023
Anthony Manchin
Jamie Sherrah
Qi Wu
Anton Van Den Hengel
VGen
83
0
0
01 Feb 2023
Semi-Parametric Video-Grounded Text Generation
Sungdong Kim
Jin-Hwa Kim
Jiyoung Lee
Minjoon Seo
VGen
244
17
0
27 Jan 2023
Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring
Computer Vision and Pattern Recognition (CVPR), 2023
Ruyang Liu
Jingjia Huang
Ge Li
Jiashi Feng
Xing Wu
Thomas H. Li
AI4TS
CLIP
VLM
257
74
0
26 Jan 2023
Flow-guided Semi-supervised Video Object Segmentation
Yushan Zhang
Andreas Robinson
M. Magnusson
Michael Felsberg
VOS
188
1
0
25 Jan 2023
MultiNet with Transformers: A Model for Cancer Diagnosis Using Images
H. Barzekar
Yash J. Patel
L. Tong
Zeyun Yu
MedIm
181
8
0
21 Jan 2023
Temporal Perceiving Video-Language Pre-training
Fan Ma
Xiaojie Jin
Heng Wang
Jingjia Huang
Linchao Zhu
Jiashi Feng
Yi Yang
VLM
206
17
0
18 Jan 2023
A Survey on Self-supervised Learning: Algorithms, Applications, and Future Trends
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Jie Gui
Tuo Chen
Jing Zhang
Qiong Cao
Zhe Sun
Haoran Luo
Dacheng Tao
565
354
0
13 Jan 2023
Self-Attention Amortized Distributional Projection Optimization for Sliced Wasserstein Point-Cloud Reconstruction
International Conference on Machine Learning (ICML), 2023
Khai Nguyen
Dang Nguyen
N. Ho
166
9
0
12 Jan 2023
Previous
1
2
3
...
5
6
7
...
15
16
17
Next