Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.11271
Cited By
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens
17 June 2024
Anas Awadalla
Le Xue
Oscar Lo
Manli Shu
Hannah Lee
E. Guha
Matt Jordan
Sheng Shen
Mohamed Awadalla
Silvio Savarese
Caiming Xiong
Ran Xu
Yejin Choi
Ludwig Schmidt
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens"
5 / 5 papers shown
Title
Baichuan-Omni-1.5 Technical Report
Yadong Li
J. Liu
Tao Zhang
Tao Zhang
S. Chen
...
Jianhua Xu
Haoze Sun
Mingan Lin
Zenan Zhou
Weipeng Chen
AuLLM
64
10
0
28 Jan 2025
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Wenqi Zhang
Hang Zhang
Xin Li
Jiashuo Sun
Yongliang Shen
Weiming Lu
Deli Zhao
Yueting Zhuang
Lidong Bing
VLM
34
2
0
01 Jan 2025
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
Peng Xia
Siwei Han
Shi Qiu
Yiyang Zhou
Zhaoyang Wang
...
Chenhang Cui
Mingyu Ding
Linjie Li
Lijuan Wang
Huaxiu Yao
45
10
0
14 Oct 2024
DataComp-LM: In search of the next generation of training sets for language models
Jeffrey Li
Alex Fang
Georgios Smyrnis
Maor Ivgi
Matt Jordan
...
Alexandros G. Dimakis
Y. Carmon
Achal Dave
Ludwig Schmidt
Vaishaal Shankar
ELM
29
79
0
17 Jun 2024
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
236
1,508
0
31 Dec 2020
1