Weakly-Supervised Alignment of Video With Text

22 May 2015

Papers citing "Weakly-Supervised Alignment of Video With Text"

50 / 62 papers shown

Title
A Survey on Integrated Sensing, Communication, and Computation Dingzhu Wen Yong Zhou Xiaoyang Li Yuanming Shi Kaibin Huang Khaled B. Letaief 29 24 0 15 Aug 2024
OTAS: Unsupervised Boundary Detection for Object-Centric Temporal Action Segmentation Yuerong Li Zhengrong Xue Huazhe Xu 11 4 0 12 Sep 2023
Learning to Ground Instructional Articles in Videos through Narrations E. Mavroudi Triantafyllos Afouras Lorenzo Torresani DiffM 33 21 0 06 Jun 2023
Video alignment using unsupervised learning of local and global features Niloofar Fakhfour Mohammad ShahverdiKondori Hoda Mohammadzade 30 3 0 13 Apr 2023
Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations Yiwu Zhong Licheng Yu Yang Bai Shangwen Li Xueting Yan Yin Li AI4TS 30 31 0 31 Mar 2023
Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video Minsu Kim Chae Won Kim Y. Ro CVBM DiffM 30 3 0 27 Feb 2023
Temporal Action Segmentation: An Analysis of Modern Techniques Guodong Ding Fadime Sener Angela Yao 37 74 0 19 Oct 2022
Temporal Alignment Networks for Long-term Video Tengda Han Weidi Xie Andrew Zisserman AI4TS 20 82 0 06 Apr 2022
Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos Tomávs Souvcek Jean-Baptiste Alayrac Antoine Miech Ivan Laptev Josef Sivic 19 32 0 22 Mar 2022
Conditional Gradients for the Approximate Vanishing Ideal E. Wirth S. Pokutta 8 1 0 07 Feb 2022
Unsupervised Temporal Video Grounding with Deep Semantic Clustering Daizong Liu Xiaoye Qu Yinzhen Wang Xing Di Kai Zou Yu Cheng Zichuan Xu Pan Zhou 23 51 0 14 Jan 2022
Aligning Subtitles in Sign Language Videos Hannah Bull Triantafyllos Afouras Gül Varol Samuel Albanie Liliane Momeni Andrew Zisserman SLR 19 30 0 06 May 2021
Temporal Query Networks for Fine-grained Video Understanding Chuhan Zhang Ankush Gupta Andrew Zisserman 16 82 0 19 Apr 2021
TEACHTEXT: CrossModal Generalized Distillation for Text-Video Retrieval Ioana Croitoru Simion-Vlad Bogolin Marius Leordeanu Hailin Jin Andrew Zisserman Samuel Albanie Yang Liu VGen 11 124 0 16 Apr 2021
A Comprehensive Review of the Video-to-Text Problem Jesus Perez-Martin B. Bustos S. Guimarães I. Sipiran Jorge A. Pérez Grethel Coello Said 13 17 0 27 Mar 2021
Action Duration Prediction for Segment-Level Alignment of Weakly-Labeled Videos Reza Ghoddoosian S. Sayed V. Athitsos AI4TS 14 7 0 20 Nov 2020
Data-efficient Alignment of Multimodal Sequences by Aligning Gradient Updates and Internal Feature Distributions Jianan Wang Boyang Albert Li Xiangyu Fan Jing-Hua Lin Yanwei Fu 23 2 0 15 Nov 2020
COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning Simon Ging Mohammadreza Zolfaghari Hamed Pirsiavash Thomas Brox ViT CLIP 13 168 0 01 Nov 2020
Frame-wise Cross-modal Matching for Video Moment Retrieval Haoyu Tang Jihua Zhu Meng Liu Zan Gao Zhiyong Cheng 24 61 0 22 Sep 2020
A Recipe for Creating Multimodal Aligned Datasets for Sequential Tasks Angela S. Lin Sudha Rao Asli Celikyilmaz E. Nouri Chris Brockett Debadeepta Dey Bill Dolan 10 24 0 19 May 2020
Matching Questions and Answers in Dialogues from Online Forums Qi Jia Mengxue Zhang Shengyao Zhang Kenny Q. Zhu 6 4 0 19 May 2020
Condensed Movies: Story Based Retrieval with Contextual Embeddings Max Bain Arsha Nagrani A. Brown Andrew Zisserman 28 100 0 08 May 2020
Action Graphs: Weakly-supervised Action Localization with Graph Convolution Networks M. Rashid Hedvig Kjellström Yong Jae Lee WSOL GNN 11 46 0 04 Feb 2020
Discriminative Clustering with Representation Learning with any Ratio of Labeled to Unlabeled Data Corinne Jones Vincent Roulet Zaïd Harchaoui 22 1 0 30 Dec 2019
End-to-End Learning of Visual Representations from Uncurated Instructional Videos Antoine Miech Jean-Baptiste Alayrac Lucas Smaira Ivan Laptev Josef Sivic Andrew Zisserman VGen SSL 25 700 0 13 Dec 2019
Weakly-Supervised Video Moment Retrieval via Semantic Completion Network Zhijie Lin Zhou Zhao Zhu Zhang Qi. Wang Huasheng Liu 14 149 0 19 Nov 2019
Temporally Grounding Language Queries in Videos by Contextual Boundary-aware Prediction Jingwen Wang Lin Ma Wenhao Jiang 15 180 0 11 Sep 2019
Finding Moments in Video Collections Using Natural Language Victor Escorcia Mattia Soldan Josef Sivic Bernard Ghanem Bryan C. Russell 23 6 0 30 Jul 2019
Cross-Modal Interaction Networks for Query-Based Moment Retrieval in Videos Zhu Zhang Zhijie Lin Zhou Zhao Zhenxin Xiao 9 212 0 06 Jun 2019
TalkSumm: A Dataset and Scalable Annotation Method for Scientific Paper Summarization Based on Conference Talks Guy Lev Michal Shmueli-Scheuer Jonathan Herzig Achiya Jerbi D. Konopnicki 14 52 0 04 Jun 2019
A Hybrid RNN-HMM Approach for Weakly Supervised Temporal Action Segmentation Hilde Kuehne Alexander Richard Juergen Gall 19 82 0 03 Jun 2019
Tripping through time: Efficient Localization of Activities in Videos Meera Hahn Asim Kadav James M. Rehg H. Graf 10 85 0 22 Apr 2019
Unsupervised Discovery of Multimodal Links in Multi-image, Multi-sentence Documents Jack Hessel Lillian Lee David M. Mimno 15 30 0 16 Apr 2019
Weakly Supervised Gaussian Networks for Action Detection Basura Fernando Cheston Tan Yin Chet Hakan Bilen 9 24 0 16 Apr 2019
Action Recognition from Single Timestamp Supervision in Untrimmed Videos Davide Moltisanti Sanja Fidler Dima Damen 14 61 0 09 Apr 2019
Modularized Textual Grounding for Counterfactual Resilience Zhiyuan Fang Shu Kong Charless C. Fowlkes Yezhou Yang 12 32 0 07 Apr 2019
Weakly Supervised Video Moment Retrieval From Text Queries Niluthpol Chowdhury Mithun S. Paul A. Roy-Chowdhury 19 192 0 05 Apr 2019
Unsupervised Image Matching and Object Discovery as Optimization Huy V. Vo Francis R. Bach Minsu Cho Kai Han Yann LeCun P. Pérez Jean Ponce OCL 11 65 0 05 Apr 2019
Cross-task weakly supervised learning from instructional videos Dimitri Zhukov Jean-Baptiste Alayrac R. G. Cinbis David Fouhey Ivan Laptev Josef Sivic SSL 6 241 0 19 Mar 2019
Read, Watch, and Move: Reinforcement Learning for Temporally Grounding Natural Language Descriptions in Videos Dongliang He Xiang Zhao Jizhou Huang Fu Li Xiao-Chang Liu Shilei Wen 14 152 0 21 Jan 2019
D3TW: Discriminative Differentiable Dynamic Time Warping for Weakly Supervised Action Alignment and Segmentation C. Chang De-An Huang Yanan Sui Li Fei-Fei Juan Carlos Niebles 22 156 0 09 Jan 2019
Weakly Supervised Dense Event Captioning in Videos Xuguang Duan Wen-bing Huang Chuang Gan Jingdong Wang Wenwu Zhu Junzhou Huang 25 148 0 10 Dec 2018
Learning to Localize and Align Fine-Grained Actions to Sparse Instructions Meera Hahn Nataniel Ruiz Jean-Baptiste Alayrac Ivan Laptev James M. Rehg 11 6 0 22 Sep 2018
W-TALC: Weakly-supervised Temporal Activity Localization and Classification S. Paul Sourya Roy A. Roy-Chowdhury 11 305 0 27 Jul 2018
Step-by-step Erasion, One-by-one Collection: A Weakly Supervised Temporal Action Detector Jia-Xing Zhong Nannan Li Weijie Kong Zhang Tao Thomas H. Li Ge Li 14 93 0 09 Jul 2018
To Find Where You Talk: Temporal Sentence Localization in Video with Attention Based Location Regression Yitian Yuan Tao Mei Wenwu Zhu 11 332 0 19 Apr 2018
Learning a Text-Video Embedding from Incomplete and Heterogeneous Data Antoine Miech Ivan Laptev Josef Sivic 19 233 0 07 Apr 2018
A Neural Multi-sequence Alignment TeCHnique (NeuMATCH) Pelin Dogan Boyang Albert Li Leonid Sigal Markus Gross AI4TS 22 19 0 19 Feb 2018
Multimodal Visual Concept Learning with Weakly Supervised Techniques Giorgos Bouritsas Petros Koutras Athanasia Zlatintsi Petros Maragos 14 7 0 03 Dec 2017
Localizing Moments in Video with Natural Language Lisa Anne Hendricks Oliver Wang Eli Shechtman Josef Sivic Trevor Darrell Bryan C. Russell 23 925 0 04 Aug 2017