Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1609.08675
Cited By
YouTube-8M: A Large-Scale Video Classification Benchmark
27 September 2016
Sami Abu-El-Haija
Nisarg Kothari
Joonseok Lee
Apostol Natsev
G. Toderici
Balakrishnan Varadarajan
Sudheendra Vijayanarasimhan
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"YouTube-8M: A Large-Scale Video Classification Benchmark"
50 / 170 papers shown
Title
Advance Fake Video Detection via Vision Transformers
Joy Battocchio
S. Dell’Anna
Andrea Montibeller
Giulia Boato
ViT
VGen
34
0
0
29 Apr 2025
Get In Video: Add Anything You Want to the Video
Shaobin Zhuang
Zhipeng Huang
Binxin Yang
Ying Zhang
Fangyikang Wang
Canmiao Fu
Chong Sun
Zheng-Jun Zha
Chen Li
Y. Wang
DiffM
VGen
56
0
0
08 Mar 2025
Video Latent Flow Matching: Optimal Polynomial Projections for Video Interpolation and Extrapolation
Yang Cao
Zhao-quan Song
Chiwun Yang
VGen
44
2
0
01 Feb 2025
Baichuan-Omni-1.5 Technical Report
Yadong Li
J. Liu
Tao Zhang
Tao Zhang
S. Chen
...
Jianhua Xu
Haoze Sun
Mingan Lin
Zenan Zhou
Weipeng Chen
AuLLM
72
10
0
28 Jan 2025
BILLNET: A Binarized Conv3D-LSTM Network with Logic-gated residual architecture for hardware-efficient video inference
Van Thien Nguyen
William Guicquero
Gilles Sicard
3DV
MQ
74
2
0
24 Jan 2025
On Learning Multi-Modal Forgery Representation for Diffusion Generated Video Detection
Xiufeng Song
Xiao Guo
J. Zhang
Qirui Li
Lei Bai
Xiaoming Liu
Guangtao Zhai
Xiaohong Liu
DiffM
VGen
69
9
0
31 Oct 2024
SONIQUE: Video Background Music Generation Using Unpaired Audio-Visual Data
Liqian Zhang
Magdalena Fuentes
DiffM
VGen
37
3
0
04 Oct 2024
Towards Student Actions in Classroom Scenes: New Dataset and Baseline
Zhuolin Tan
Chenqiang Gao
Anyong Qin
Ruixin Chen
Tiecheng Song
Feng Yang
Deyu Meng
29
0
0
02 Sep 2024
Start from Video-Music Retrieval: An Inter-Intra Modal Loss for Cross Modal Retrieval
Zeyu Chen
Pengfei Zhang
Kai Ye
Wei Dong
Xin Feng
Yana Zhang
41
0
0
28 Jul 2024
MOSA: Music Motion with Semantic Annotation Dataset for Cross-Modal Music Processing
Yu-Fen Huang
Nikki Moran
Simon Coleman
Jon Kelly
Shun-Hwa Wei
...
Chih-Hsuan Li
Da-Yu Huang
Hsuan-Kai Kao
Ting-Wei Lin
Li Su
24
1
0
10 Jun 2024
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
Zeyue Tian
Zhaoyang Liu
Ruibin Yuan
Jiahao Pan
Xiaoqiang Huang
Xu Tan
Xu Tan
Qifeng Chen
Y. Guo
VGen
100
16
0
06 Jun 2024
LMVD: A Large-Scale Multimodal Vlog Dataset for Depression Detection in the Wild
Lang He
Kai Chen
Junnan Zhao
Yimeng Wang
Ercheng Pei
...
Shiqing Zhang
Jie Zhang
Zhongmin Wang
Tao He
Prayag Tiwari
50
3
0
09 May 2024
Embodied Understanding of Driving Scenarios
Yunsong Zhou
Linyan Huang
Qingwen Bu
Jia Zeng
Tianyu Li
Hang Qiu
Hongzi Zhu
Minyi Guo
Yu Qiao
Hongyang Li
LM&Ro
55
31
0
07 Mar 2024
Multimodal Transformer With a Low-Computational-Cost Guarantee
Sungjin Park
Edward Choi
46
1
0
23 Feb 2024
A Novel BERT-based Classifier to Detect Political Leaning of YouTube Videos based on their Titles
Nouar Aldahoul
Talal Rahwan
Yasir Zaki
24
0
0
16 Feb 2024
Detecting Multimedia Generated by Large AI Models: A Survey
Li Lin
Neeraj Gupta
Yue Zhang
Hainan Ren
Chun-Hao Liu
Feng Ding
Xin Eric Wang
X. Li
Luisa Verdoliva
Shu Hu
83
56
0
22 Jan 2024
A Strong Baseline for Temporal Video-Text Alignment
Zeqian Li
Qirui Chen
Tengda Han
Ya-Qin Zhang
Yanfeng Wang
Weidi Xie
AI4TS
VGen
24
5
0
21 Dec 2023
Learning Human Action Recognition Representations Without Real Humans
Howard Zhong
Samarth Mishra
Donghyun Kim
SouYoung Jin
Rameswar Panda
Hildegard Kuehne
Leonid Karlinsky
Venkatesh Saligrama
Aude Oliva
Rogerio Feris
24
3
0
10 Nov 2023
MuseChat: A Conversational Music Recommendation System for Videos
Zhikang Dong
Bin Chen
Xiulong Liu
Paweł Polak
Peng Zhang
LRM
37
26
0
10 Oct 2023
CPR-Coach: Recognizing Composite Error Actions based on Single-class Training
Shunli Wang
Qing Yu
Shuai Wang
Dingkang Yang
Liuzhen Su
Xiao Zhao
Haopeng Kuang
Pei Zhang
Peng Zhai
Lihua Zhang
31
3
0
21 Sep 2023
Video-to-Music Recommendation using Temporal Alignment of Segments
Laure Prétet
G. Richard
Clement Souchier
Geoffroy Peeters
AI4TS
29
13
0
12 Jun 2023
Fairness and Bias in Truth Discovery Algorithms: An Experimental Analysis
Simone Lazier
Saravanan Thirumuruganathan
Hadis Anahideh
8
3
0
25 Apr 2023
VMCML: Video and Music Matching via Cross-Modality Lifting
Yi-Shan Lee
Wei-Cheng Tseng
Fu-En Wang
Min Sun
11
0
0
22 Mar 2023
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Antoine Yang
Arsha Nagrani
Paul Hongsuck Seo
Antoine Miech
Jordi Pont-Tuset
Ivan Laptev
Josef Sivic
Cordelia Schmid
AI4TS
VLM
23
220
0
27 Feb 2023
Dreamix: Video Diffusion Models are General Video Editors
Eyal Molad
Eliahu Horwitz
Dani Valevski
Alex Rav Acha
Yossi Matias
Yael Pritch
Yaniv Leviathan
Yedid Hoshen
DiffM
VGen
25
181
0
02 Feb 2023
HierVL: Learning Hierarchical Video-Language Embeddings
Kumar Ashutosh
Rohit Girdhar
Lorenzo Torresani
Kristen Grauman
VLM
AI4TS
20
51
0
05 Jan 2023
FEVA: Fast Event Video Annotation Tool
Snehesh Shrestha
William Sentosatio
Huiashu Peng
Cornelia Fermuller
Yiannis Aloimonos
33
5
0
01 Jan 2023
Depression Diagnosis and Analysis via Multimodal Multi-order Factor Fusion
Chengbo Yuan
Qianhui Xu
Yong Luo
10
6
0
31 Dec 2022
NIRVANA: Neural Implicit Representations of Videos with Adaptive Networks and Autoregressive Patch-wise Modeling
Shishira R. Maiya
Sharath Girish
Max Ehrlich
Hanyu Wang
Kwot Sin Lee
Patrick Poirson
Pengxiang Wu
Chen Wang
Abhinav Shrivastava
VGen
36
40
0
30 Dec 2022
Tencent AVS: A Holistic Ads Video Dataset for Multi-modal Scene Segmentation
Jie Jiang
Zhimin Li
Jiangfeng Xiong
Rongwei Quan
Qinglin Lu
Wei Liu
16
2
0
09 Dec 2022
Seeing the Unseen: Errors and Bias in Visual Datasets
Hongrui Jin
19
0
0
03 Nov 2022
A Human-ML Collaboration Framework for Improving Video Content Reviews
Meghana Deodhar
Xiao Ma
Yixin Cai
Alex Koes
Alex Beutel
Jilin Chen
29
3
0
18 Oct 2022
Debiased Cross-modal Matching for Content-based Micro-video Background Music Recommendation
Jin Yi
Zhenzhong Chen
33
1
0
07 Aug 2022
MinVIS: A Minimal Video Instance Segmentation Framework without Video-based Training
De-An Huang
Zhiding Yu
Anima Anandkumar
VLM
39
78
0
03 Aug 2022
Dilated Context Integrated Network with Cross-Modal Consensus for Temporal Emotion Localization in Videos
Juncheng Billy Li
Junlin Xie
Linchao Zhu
Long Qian
Siliang Tang
...
Haochen Shi
Shengyu Zhang
Longhui Wei
Qi Tian
Yueting Zhuang
32
12
0
03 Aug 2022
Video Question Answering with Iterative Video-Text Co-Tokenization
A. Piergiovanni
K. Morton
Weicheng Kuo
Michael S. Ryoo
A. Angelova
20
17
0
01 Aug 2022
Membership Inference Attacks via Adversarial Examples
Hamid Jalalzai
Elie Kadoche
Rémi Leluc
Vincent Plassier
AAML
FedML
MIACV
27
7
0
27 Jul 2022
BRACE: The Breakdancing Competition Dataset for Dance Motion Synthesis
Davide Moltisanti
Jinyi Wu
Bo Dai
Chen Change Loy
DiffM
17
4
0
20 Jul 2022
Large-scale Robustness Analysis of Video Action Recognition Models
Madeline Chantry Schiappa
Naman Biyani
Prudvi Kamtam
Shruti Vyas
Hamid Palangi
Vibhav Vineet
Y. S. Rawat
AAML
24
24
0
04 Jul 2022
Self-Supervised Learning for Videos: A Survey
Madeline Chantry Schiappa
Y. S. Rawat
M. Shah
SSL
34
131
0
18 Jun 2022
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge
Linxi Fan
Guanzhi Wang
Yunfan Jiang
Ajay Mandlekar
Yuncong Yang
Haoyi Zhu
Andrew Tang
De-An Huang
Yuke Zhu
Anima Anandkumar
LM&Ro
42
347
0
17 Jun 2022
A Survey on Video Action Recognition in Sports: Datasets, Methods and Applications
Fei Wu
Qingzhong Wang
Jian Bian
Haoyi Xiong
Ning Ding
Feixiang Lu
Junqing Cheng
Dejing Dou
AI4TS
24
52
0
02 Jun 2022
Composing General Audio Representation by Fusing Multilayer Features of a Pre-trained Model
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
N. Harada
K. Kashino
16
5
0
17 May 2022
On Negative Sampling for Audio-Visual Contrastive Learning from Movies
Mahdi M. Kalayeh
Shervin Ardeshir
Lingyi Liu
Nagendra Kamath
Ashok Chandrashekar
SSL
22
3
0
29 Apr 2022
Google Scanned Objects: A High-Quality Dataset of 3D Scanned Household Items
Laura Downs
Anthony G. Francis
Nate Koenig
Brandon Kinman
R. Hickman
Krista Reymann
T. B. McHugh
Vincent Vanhoucke
LM&Ro
27
472
0
25 Apr 2022
Empirical Evaluation and Theoretical Analysis for Representation Learning: A Survey
Kento Nozawa
Issei Sato
AI4TS
19
4
0
18 Apr 2022
BYOL for Audio: Exploring Pre-trained General-purpose Audio Representations
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
N. Harada
K. Kashino
SSL
34
53
0
15 Apr 2022
Leveraging Adversarial Examples to Quantify Membership Information Leakage
Ganesh Del Grosso
Hamid Jalalzai
Georg Pichler
C. Palamidessi
Pablo Piantanida
MIACV
26
21
0
17 Mar 2022
Transframer: Arbitrary Frame Prediction with Generative Models
C. Nash
João Carreira
Jacob Walker
Iain Barr
Andrew Jaegle
Mateusz Malinowski
Peter W. Battaglia
ViT
19
37
0
17 Mar 2022
GrainSpace: A Large-scale Dataset for Fine-grained and Domain-adaptive Recognition of Cereal Grains
Lei Fan
Yiwen Ding
Dongdong Fan
Donglin Di
M. Pagnucco
Yang Song
AI4TS
21
19
0
10 Mar 2022
1
2
3
4
Next