End-to-End Learning of Visual Representations from Uncurated Instructional Videos

13 December 2019

Antoine Miech

Jean-Baptiste Alayrac

Papers citing "End-to-End Learning of Visual Representations from Uncurated Instructional Videos"

50 / 165 papers shown

Title
BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning Eric Jang A. Irpan Mohi Khansari Daniel Kappler F. Ebert Corey Lynch Sergey Levine Chelsea Finn LM&Ro 38 516 0 04 Feb 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation Junnan Li Dongxu Li Caiming Xiong S. Hoi MLLM BDL VLM CLIP 390 4,124 0 28 Jan 2022
Ranking Info Noise Contrastive Estimation: Boosting Contrastive Learning via Ranked Positives David T. Hoffmann Nadine Behrmann Juergen Gall Thomas Brox M. Noroozi 25 43 0 27 Jan 2022
Learning To Recognize Procedural Activities with Distant Supervision Xudong Lin Fabio Petroni Gedas Bertasius Marcus Rohrbach Shih-Fu Chang Lorenzo Torresani 22 82 0 26 Jan 2022
SA-VQA: Structured Alignment of Visual and Semantic Representations for Visual Question Answering Peixi Xiong Quanzeng You Pei Yu Zicheng Liu Ying Wu 10 5 0 25 Jan 2022
End-to-end Generative Pretraining for Multimodal Video Captioning Paul Hongsuck Seo Arsha Nagrani Anurag Arnab Cordelia Schmid 27 164 0 20 Jan 2022
Self-supervised Video Representation Learning with Cascade Positive Retrieval Cheng-En Wu Farley Lai Yujie Hu Asim Kadav SSL AI4TS 20 3 0 20 Jan 2022
Video Transformers: A Survey Javier Selva A. S. Johansen Sergio Escalera Kamal Nasrollahi T. Moeslund Albert Clapés ViT 20 103 0 16 Jan 2022
Bridging Video-text Retrieval with Multiple Choice Questions Yuying Ge Yixiao Ge Xihui Liu Dian Li Ying Shan Xiaohu Qie Ping Luo BDL 16 108 0 13 Jan 2022
Robust Contrastive Learning against Noisy Views Ching-Yao Chuang R. Devon Hjelm Xin Eric Wang Vibhav Vineet Neel Joshi Antonio Torralba Stefanie Jegelka Ya-heng Song NoLa 13 68 0 12 Jan 2022
Align and Prompt: Video-and-Language Pre-training with Entity Prompts Dongxu Li Junnan Li Hongdong Li Juan Carlos Niebles S. Hoi 20 191 0 17 Dec 2021
Improving Self-supervised Learning with Automated Unsupervised Outlier Arbitration Yu Wang Jingyang Lin Jingjing Zou Yingwei Pan Ting Yao Tao Mei OODD 24 12 0 15 Dec 2021
SVIP: Sequence VerIfication for Procedures in Videos Yichen Qian Weixin Luo Dongze Lian Xu Tang P. Zhao Shenghua Gao ViT 16 17 0 13 Dec 2021
Exploring Temporal Granularity in Self-Supervised Video Representation Learning Rui Qian Yeqing Li Liangzhe Yuan Boqing Gong Ting Liu Matthew A. Brown Serge J. Belongie Ming-Hsuan Yang Hartwig Adam Yin Cui AI4TS 41 6 0 08 Dec 2021
Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval Nina Shvetsova Brian Chen Andrew Rouditchenko Samuel Thomas Brian Kingsbury Rogerio Feris David F. Harwath James R. Glass Hilde Kuehne ViT 23 129 0 08 Dec 2021
Time-Equivariant Contrastive Video Representation Learning Simon Jenni Hailin Jin SSL AI4TS 135 60 0 07 Dec 2021
Iterative Contrast-Classify For Semi-supervised Temporal Action Segmentation Dipika Singhania R. Rahaman Angela Yao 19 23 0 02 Dec 2021
CRIS: CLIP-Driven Referring Image Segmentation Zhaoqing Wang Yu Lu Qiang Li Xunqiang Tao Yan Guo Ming Gong Tongliang Liu VLM 36 359 0 30 Nov 2021
PolyViT: Co-training Vision Transformers on Images, Videos and Audio Valerii Likhosherstov Anurag Arnab K. Choromanski Mario Lucic Yi Tay Adrian Weller Mostafa Dehghani ViT 33 73 0 25 Nov 2021
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling Tsu-jui Fu Linjie Li Zhe Gan Kevin Qinghong Lin W. Wang Lijuan Wang Zicheng Liu VLM 34 216 0 24 Nov 2021
EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching Yaya Shi Xu Yang Haiyang Xu Chunfen Yuan Bing Li Weiming Hu Zhengjun Zha 31 33 0 17 Nov 2021
Masking Modalities for Cross-modal Video Retrieval Valentin Gabeur Arsha Nagrani Chen Sun Alahari Karteek Cordelia Schmid 11 29 0 01 Nov 2021
BiC-Net: Learning Efficient Spatio-Temporal Relation for Text-Video Retrieval Ning Han Jingjing Chen Chuhao Shi Yawen Zeng Guangyi Xiao Hao Chen 22 10 0 29 Oct 2021
Survey: Transformer based Video-Language Pre-training Ludan Ruan Qin Jin VLM ViT 64 44 0 21 Sep 2021
ActionCLIP: A New Paradigm for Video Action Recognition Mengmeng Wang Jiazheng Xing Yong Liu VLM 149 362 0 17 Sep 2021
Support-Set Based Cross-Supervision for Video Grounding Xinpeng Ding N. Wang Shiwei Zhang De-Chun Cheng Xiaomeng Li Ziyuan Huang Mingqian Tang Xinbo Gao 33 42 0 24 Aug 2021
HANet: Hierarchical Alignment Networks for Video-Text Retrieval Peng Wu Xiangteng He Mingqian Tang Yiliang Lv Jing Liu 23 52 0 26 Jul 2021
QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries Jie Lei Tamara L. Berg Mohit Bansal ViT 19 62 0 20 Jul 2021
CLIP2Video: Mastering Video-Text Retrieval via Image CLIP Han Fang Pengfei Xiong Luhui Xu Yu Chen CLIP VLM 13 291 0 21 Jun 2021
Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting Martine Toering Ioannis Gatopoulos M. Stol Vincent Tao Hu SSL 25 11 0 18 Jun 2021
Linguistic Structures as Weak Supervision for Visual Scene Graph Generation Keren Ye Adriana Kovashka 21 52 0 28 May 2021
Divide and Contrast: Self-supervised Learning from Uncurated Data Yonglong Tian Olivier J. Hénaff Aaron van den Oord SSL 51 96 0 17 May 2021
Designing Multimodal Datasets for NLP Challenges James Pustejovsky E. Holderness Jingxuan Tu Parker Glenn Kyeongmin Rim Kelley Lynch R. Brutti 13 5 0 12 May 2021
Recent Advances in Deep Learning Based Dialogue Systems: A Systematic Survey Jinjie Ni Tom Young Vlad Pandelea Fuzhao Xue Erik Cambria 52 267 0 10 May 2021
Compressing Visual-linguistic Model via Knowledge Distillation Zhiyuan Fang Jianfeng Wang Xiaowei Hu Lijuan Wang Yezhou Yang Zicheng Liu VLM 31 96 0 05 Apr 2021
Multiview Pseudo-Labeling for Semi-supervised Learning from Video Bo Xiong Haoqi Fan Kristen Grauman Christoph Feichtenhofer SSL 19 49 0 01 Apr 2021
Broaden Your Views for Self-Supervised Video Learning Adrià Recasens Pauline Luc Jean-Baptiste Alayrac Luyu Wang Ross Hemsley ... Florent Altché M. Valko Jean-Bastien Grill Aaron van den Oord Andrew Zisserman SSL AI4TS 23 127 0 30 Mar 2021
Low-Fidelity End-to-End Video Encoder Pre-training for Temporal Action Localization Mengmeng Xu Juan-Manuel Perez-Rua Xiatian Zhu Bernard Ghanem Brais Martinez 15 27 0 28 Mar 2021
MDMMT: Multidomain Multimodal Transformer for Video Retrieval Maksim Dzabraev M. Kalashnikov Stepan Alekseevich Komkov Aleksandr Petiushko 13 128 0 19 Mar 2021
Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning Mandela Patrick Yuki M. Asano Bernie Huang Ishan Misra Florian Metze Joao Henriques Andrea Vedaldi AI4TS 16 33 0 18 Mar 2021
On Semantic Similarity in Video Retrieval Michael Wray Hazel Doughty Dima Damen 21 66 0 18 Mar 2021
Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models Po-Yao (Bernie) Huang Mandela Patrick Junjie Hu Graham Neubig Florian Metze Alexander G. Hauptmann MLLM VLM 19 56 0 16 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision Alec Radford Jong Wook Kim Chris Hallacy Aditya A. Ramesh Gabriel Goh ... Amanda Askell Pamela Mishkin Jack Clark Gretchen Krueger Ilya Sutskever CLIP VLM 98 27,632 0 26 Feb 2021
Unifying Vision-and-Language Tasks via Text Generation Jaemin Cho Jie Lei Hao Tan Mohit Bansal MLLM 249 525 0 04 Feb 2021
Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers Lisa Anne Hendricks John F. J. Mellor R. Schneider Jean-Baptiste Alayrac Aida Nematzadeh 75 110 0 31 Jan 2021
Learning Temporal Dynamics from Cycles in Narrated Video Dave Epstein Jiajun Wu Cordelia Schmid Chen Sun AI4TS 28 14 0 07 Jan 2021
Learning the Predictability of the Future Dídac Surís Ruoshi Liu Carl Vondrick 16 71 0 01 Jan 2021
Look Before you Speak: Visually Contextualized Utterances Paul Hongsuck Seo Arsha Nagrani Cordelia Schmid 19 66 0 10 Dec 2020
Hierarchically Decoupled Spatial-Temporal Contrast for Self-supervised Video Representation Learning Zehua Zhang David J. Crandall AI4TS SSL 16 23 0 23 Nov 2020
Game Plan: What AI can do for Football, and What Football can do for AI K. Tuyls Shayegan Omidshafiei Paul Muller Zhe Wang Jerome T. Connor ... Simon Bouton Nathalie Beauguerlange Jackson Broshear T. Graepel Demis Hassabis 33 78 0 18 Nov 2020