ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1912.06430
  4. Cited By
End-to-End Learning of Visual Representations from Uncurated
  Instructional Videos

End-to-End Learning of Visual Representations from Uncurated Instructional Videos

13 December 2019
Antoine Miech
Jean-Baptiste Alayrac
Lucas Smaira
Ivan Laptev
Josef Sivic
Andrew Zisserman
    VGen
    SSL
ArXivPDFHTML

Papers citing "End-to-End Learning of Visual Representations from Uncurated Instructional Videos"

50 / 165 papers shown
Title
BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning
BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning
Eric Jang
A. Irpan
Mohi Khansari
Daniel Kappler
F. Ebert
Corey Lynch
Sergey Levine
Chelsea Finn
LM&Ro
38
516
0
04 Feb 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
390
4,124
0
28 Jan 2022
Ranking Info Noise Contrastive Estimation: Boosting Contrastive Learning
  via Ranked Positives
Ranking Info Noise Contrastive Estimation: Boosting Contrastive Learning via Ranked Positives
David T. Hoffmann
Nadine Behrmann
Juergen Gall
Thomas Brox
M. Noroozi
25
43
0
27 Jan 2022
Learning To Recognize Procedural Activities with Distant Supervision
Learning To Recognize Procedural Activities with Distant Supervision
Xudong Lin
Fabio Petroni
Gedas Bertasius
Marcus Rohrbach
Shih-Fu Chang
Lorenzo Torresani
22
82
0
26 Jan 2022
SA-VQA: Structured Alignment of Visual and Semantic Representations for
  Visual Question Answering
SA-VQA: Structured Alignment of Visual and Semantic Representations for Visual Question Answering
Peixi Xiong
Quanzeng You
Pei Yu
Zicheng Liu
Ying Wu
10
5
0
25 Jan 2022
End-to-end Generative Pretraining for Multimodal Video Captioning
End-to-end Generative Pretraining for Multimodal Video Captioning
Paul Hongsuck Seo
Arsha Nagrani
Anurag Arnab
Cordelia Schmid
27
164
0
20 Jan 2022
Self-supervised Video Representation Learning with Cascade Positive
  Retrieval
Self-supervised Video Representation Learning with Cascade Positive Retrieval
Cheng-En Wu
Farley Lai
Yujie Hu
Asim Kadav
SSL
AI4TS
20
3
0
20 Jan 2022
Video Transformers: A Survey
Video Transformers: A Survey
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
20
103
0
16 Jan 2022
Bridging Video-text Retrieval with Multiple Choice Questions
Bridging Video-text Retrieval with Multiple Choice Questions
Yuying Ge
Yixiao Ge
Xihui Liu
Dian Li
Ying Shan
Xiaohu Qie
Ping Luo
BDL
16
108
0
13 Jan 2022
Robust Contrastive Learning against Noisy Views
Robust Contrastive Learning against Noisy Views
Ching-Yao Chuang
R. Devon Hjelm
Xin Eric Wang
Vibhav Vineet
Neel Joshi
Antonio Torralba
Stefanie Jegelka
Ya-heng Song
NoLa
13
68
0
12 Jan 2022
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
Dongxu Li
Junnan Li
Hongdong Li
Juan Carlos Niebles
S. Hoi
20
191
0
17 Dec 2021
Improving Self-supervised Learning with Automated Unsupervised Outlier
  Arbitration
Improving Self-supervised Learning with Automated Unsupervised Outlier Arbitration
Yu Wang
Jingyang Lin
Jingjing Zou
Yingwei Pan
Ting Yao
Tao Mei
OODD
24
12
0
15 Dec 2021
SVIP: Sequence VerIfication for Procedures in Videos
SVIP: Sequence VerIfication for Procedures in Videos
Yichen Qian
Weixin Luo
Dongze Lian
Xu Tang
P. Zhao
Shenghua Gao
ViT
16
17
0
13 Dec 2021
Exploring Temporal Granularity in Self-Supervised Video Representation
  Learning
Exploring Temporal Granularity in Self-Supervised Video Representation Learning
Rui Qian
Yeqing Li
Liangzhe Yuan
Boqing Gong
Ting Liu
Matthew A. Brown
Serge J. Belongie
Ming-Hsuan Yang
Hartwig Adam
Yin Cui
AI4TS
41
6
0
08 Dec 2021
Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Nina Shvetsova
Brian Chen
Andrew Rouditchenko
Samuel Thomas
Brian Kingsbury
Rogerio Feris
David F. Harwath
James R. Glass
Hilde Kuehne
ViT
23
129
0
08 Dec 2021
Time-Equivariant Contrastive Video Representation Learning
Time-Equivariant Contrastive Video Representation Learning
Simon Jenni
Hailin Jin
SSL
AI4TS
135
60
0
07 Dec 2021
Iterative Contrast-Classify For Semi-supervised Temporal Action
  Segmentation
Iterative Contrast-Classify For Semi-supervised Temporal Action Segmentation
Dipika Singhania
R. Rahaman
Angela Yao
19
23
0
02 Dec 2021
CRIS: CLIP-Driven Referring Image Segmentation
CRIS: CLIP-Driven Referring Image Segmentation
Zhaoqing Wang
Yu Lu
Qiang Li
Xunqiang Tao
Yan Guo
Ming Gong
Tongliang Liu
VLM
36
359
0
30 Nov 2021
PolyViT: Co-training Vision Transformers on Images, Videos and Audio
PolyViT: Co-training Vision Transformers on Images, Videos and Audio
Valerii Likhosherstov
Anurag Arnab
K. Choromanski
Mario Lucic
Yi Tay
Adrian Weller
Mostafa Dehghani
ViT
33
73
0
25 Nov 2021
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token
  Modeling
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling
Tsu-jui Fu
Linjie Li
Zhe Gan
Kevin Qinghong Lin
W. Wang
Lijuan Wang
Zicheng Liu
VLM
34
216
0
24 Nov 2021
EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained
  Embedding Matching
EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching
Yaya Shi
Xu Yang
Haiyang Xu
Chunfen Yuan
Bing Li
Weiming Hu
Zhengjun Zha
31
33
0
17 Nov 2021
Masking Modalities for Cross-modal Video Retrieval
Masking Modalities for Cross-modal Video Retrieval
Valentin Gabeur
Arsha Nagrani
Chen Sun
Alahari Karteek
Cordelia Schmid
11
29
0
01 Nov 2021
BiC-Net: Learning Efficient Spatio-Temporal Relation for Text-Video
  Retrieval
BiC-Net: Learning Efficient Spatio-Temporal Relation for Text-Video Retrieval
Ning Han
Jingjing Chen
Chuhao Shi
Yawen Zeng
Guangyi Xiao
Hao Chen
22
10
0
29 Oct 2021
Survey: Transformer based Video-Language Pre-training
Survey: Transformer based Video-Language Pre-training
Ludan Ruan
Qin Jin
VLM
ViT
64
44
0
21 Sep 2021
ActionCLIP: A New Paradigm for Video Action Recognition
ActionCLIP: A New Paradigm for Video Action Recognition
Mengmeng Wang
Jiazheng Xing
Yong Liu
VLM
149
362
0
17 Sep 2021
Support-Set Based Cross-Supervision for Video Grounding
Support-Set Based Cross-Supervision for Video Grounding
Xinpeng Ding
N. Wang
Shiwei Zhang
De-Chun Cheng
Xiaomeng Li
Ziyuan Huang
Mingqian Tang
Xinbo Gao
33
42
0
24 Aug 2021
HANet: Hierarchical Alignment Networks for Video-Text Retrieval
HANet: Hierarchical Alignment Networks for Video-Text Retrieval
Peng Wu
Xiangteng He
Mingqian Tang
Yiliang Lv
Jing Liu
23
52
0
26 Jul 2021
QVHighlights: Detecting Moments and Highlights in Videos via Natural
  Language Queries
QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries
Jie Lei
Tamara L. Berg
Mohit Bansal
ViT
19
62
0
20 Jul 2021
CLIP2Video: Mastering Video-Text Retrieval via Image CLIP
CLIP2Video: Mastering Video-Text Retrieval via Image CLIP
Han Fang
Pengfei Xiong
Luhui Xu
Yu Chen
CLIP
VLM
13
291
0
21 Jun 2021
Self-supervised Video Representation Learning with Cross-Stream
  Prototypical Contrasting
Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting
Martine Toering
Ioannis Gatopoulos
M. Stol
Vincent Tao Hu
SSL
25
11
0
18 Jun 2021
Linguistic Structures as Weak Supervision for Visual Scene Graph
  Generation
Linguistic Structures as Weak Supervision for Visual Scene Graph Generation
Keren Ye
Adriana Kovashka
21
52
0
28 May 2021
Divide and Contrast: Self-supervised Learning from Uncurated Data
Divide and Contrast: Self-supervised Learning from Uncurated Data
Yonglong Tian
Olivier J. Hénaff
Aaron van den Oord
SSL
51
96
0
17 May 2021
Designing Multimodal Datasets for NLP Challenges
Designing Multimodal Datasets for NLP Challenges
James Pustejovsky
E. Holderness
Jingxuan Tu
Parker Glenn
Kyeongmin Rim
Kelley Lynch
R. Brutti
13
5
0
12 May 2021
Recent Advances in Deep Learning Based Dialogue Systems: A Systematic
  Survey
Recent Advances in Deep Learning Based Dialogue Systems: A Systematic Survey
Jinjie Ni
Tom Young
Vlad Pandelea
Fuzhao Xue
Erik Cambria
52
267
0
10 May 2021
Compressing Visual-linguistic Model via Knowledge Distillation
Compressing Visual-linguistic Model via Knowledge Distillation
Zhiyuan Fang
Jianfeng Wang
Xiaowei Hu
Lijuan Wang
Yezhou Yang
Zicheng Liu
VLM
31
96
0
05 Apr 2021
Multiview Pseudo-Labeling for Semi-supervised Learning from Video
Multiview Pseudo-Labeling for Semi-supervised Learning from Video
Bo Xiong
Haoqi Fan
Kristen Grauman
Christoph Feichtenhofer
SSL
19
49
0
01 Apr 2021
Broaden Your Views for Self-Supervised Video Learning
Broaden Your Views for Self-Supervised Video Learning
Adrià Recasens
Pauline Luc
Jean-Baptiste Alayrac
Luyu Wang
Ross Hemsley
...
Florent Altché
M. Valko
Jean-Bastien Grill
Aaron van den Oord
Andrew Zisserman
SSL
AI4TS
23
127
0
30 Mar 2021
Low-Fidelity End-to-End Video Encoder Pre-training for Temporal Action
  Localization
Low-Fidelity End-to-End Video Encoder Pre-training for Temporal Action Localization
Mengmeng Xu
Juan-Manuel Perez-Rua
Xiatian Zhu
Bernard Ghanem
Brais Martinez
15
27
0
28 Mar 2021
MDMMT: Multidomain Multimodal Transformer for Video Retrieval
MDMMT: Multidomain Multimodal Transformer for Video Retrieval
Maksim Dzabraev
M. Kalashnikov
Stepan Alekseevich Komkov
Aleksandr Petiushko
13
128
0
19 Mar 2021
Space-Time Crop & Attend: Improving Cross-modal Video Representation
  Learning
Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning
Mandela Patrick
Yuki M. Asano
Bernie Huang
Ishan Misra
Florian Metze
Joao Henriques
Andrea Vedaldi
AI4TS
16
33
0
18 Mar 2021
On Semantic Similarity in Video Retrieval
On Semantic Similarity in Video Retrieval
Michael Wray
Hazel Doughty
Dima Damen
21
66
0
18 Mar 2021
Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual
  Transfer of Vision-Language Models
Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models
Po-Yao (Bernie) Huang
Mandela Patrick
Junjie Hu
Graham Neubig
Florian Metze
Alexander G. Hauptmann
MLLM
VLM
19
56
0
16 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
98
27,632
0
26 Feb 2021
Unifying Vision-and-Language Tasks via Text Generation
Unifying Vision-and-Language Tasks via Text Generation
Jaemin Cho
Jie Lei
Hao Tan
Mohit Bansal
MLLM
249
525
0
04 Feb 2021
Decoupling the Role of Data, Attention, and Losses in Multimodal
  Transformers
Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers
Lisa Anne Hendricks
John F. J. Mellor
R. Schneider
Jean-Baptiste Alayrac
Aida Nematzadeh
75
110
0
31 Jan 2021
Learning Temporal Dynamics from Cycles in Narrated Video
Learning Temporal Dynamics from Cycles in Narrated Video
Dave Epstein
Jiajun Wu
Cordelia Schmid
Chen Sun
AI4TS
28
14
0
07 Jan 2021
Learning the Predictability of the Future
Learning the Predictability of the Future
Dídac Surís
Ruoshi Liu
Carl Vondrick
16
71
0
01 Jan 2021
Look Before you Speak: Visually Contextualized Utterances
Look Before you Speak: Visually Contextualized Utterances
Paul Hongsuck Seo
Arsha Nagrani
Cordelia Schmid
19
66
0
10 Dec 2020
Hierarchically Decoupled Spatial-Temporal Contrast for Self-supervised
  Video Representation Learning
Hierarchically Decoupled Spatial-Temporal Contrast for Self-supervised Video Representation Learning
Zehua Zhang
David J. Crandall
AI4TS
SSL
16
23
0
23 Nov 2020
Game Plan: What AI can do for Football, and What Football can do for AI
Game Plan: What AI can do for Football, and What Football can do for AI
K. Tuyls
Shayegan Omidshafiei
Paul Muller
Zhe Wang
Jerome T. Connor
...
Simon Bouton
Nathalie Beauguerlange
Jackson Broshear
T. Graepel
Demis Hassabis
33
78
0
18 Nov 2020
Previous
1234
Next