Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2308.10755
Cited By
WanJuan: A Comprehensive Multimodal Dataset for Advancing English and Chinese Large Models
21 August 2023
Conghui He
Zhenjiang Jin
Chaoxi Xu
Jiantao Qiu
Bin Wang
Wei Li
Hang Yan
Jiaqi Wang
Da Lin
Re-assign community
ArXiv
PDF
HTML
Papers citing
"WanJuan: A Comprehensive Multimodal Dataset for Advancing English and Chinese Large Models"
7 / 7 papers shown
Title
Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models
Xinlin Zhuang
Jiahui Peng
Ren Ma
Y. Wang
Tianyi Bai
Xingjian Wei
Jiantao Qiu
Chi Zhang
Ying Qian
Conghui He
31
0
0
19 Apr 2025
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Gen Luo
Xue Yang
Wenhan Dou
Zhaokai Wang
Jifeng Dai
Jifeng Dai
Yu Qiao
Xizhou Zhu
VLM
MLLM
25
25
0
10 Oct 2024
Enhancing Startup Success Predictions in Venture Capital: A GraphRAG Augmented Multivariate Time Series Method
Zitian Gao
Yihao Xiao
AI4TS
19
1
0
18 Aug 2024
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
1,899
0
30 Jan 2023
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
375
4,010
0
28 Jan 2022
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
273
845
0
17 Feb 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
231
1,508
0
31 Dec 2020
1