Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2307.06942
Cited By
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
13 July 2023
Yi Wang
Yinan He
Yizhuo Li
Kunchang Li
Jiashuo Yu
X. Ma
Xinhao Li
Guo Chen
Xinyuan Chen
Yaohui Wang
Conghui He
Ping Luo
Ziwei Liu
Yali Wang
Limin Wang
Yu Qiao
VLM
VGen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation"
2 / 52 papers shown
Title
WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning
Krishna Srinivasan
K. Raman
Jiecao Chen
Michael Bendersky
Marc Najork
VLM
197
307
0
02 Mar 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
273
1,077
0
17 Feb 2021
Previous
1
2