Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.09857
Cited By
IGOT: Information Gain Optimized Tokenizer on Domain Adaptive Pretraining
16 May 2024
Dawei Feng
Yihai Zhang
Zhixuan Xu
SyDa
Re-assign community
ArXiv
PDF
HTML
Papers citing
"IGOT: Information Gain Optimized Tokenizer on Domain Adaptive Pretraining"
6 / 6 papers shown
Title
BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text
Elliot Bolton
Abhinav Venigalla
Michihiro Yasunaga
David Leo Wright Hall
Betty Xiong
...
R. Daneshjou
Jonathan Frankle
Percy Liang
Michael Carbin
Christopher D. Manning
LM&MA
MedIm
26
45
0
27 Mar 2024
Chip-Chat: Challenges and Opportunities in Conversational Hardware Design
Jason Blocklove
S. Garg
Ramesh Karri
Hammond Pearce
38
164
0
22 May 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
203
2,232
0
22 Mar 2023
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng-Zhen Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
240
1,070
0
05 Oct 2022
CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation
Yue Wang
Weishi Wang
Shafiq R. Joty
S. Hoi
204
1,451
0
02 Sep 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
245
1,977
0
31 Dec 2020
1