Dynamic Masking Rate Schedules for MLM Pretraining

v1v2v3 (latest)

Dynamic Masking Rate Schedules for MLM Pretraining

Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2023

24 May 2023

Davis W. Blalock

Jonathan Frankle

Matthew L. Leavitt

ArXiv (abs)PDF HTML

Papers citing "Dynamic Masking Rate Schedules for MLM Pretraining"

8 / 8 papers shown

Title
Chinese ModernBERT with Whole-Word Masking Zeyu Zhao Ningtao Wang Xing Fu Yu-Jie Cheng 56 0 0 14 Oct 2025
Understanding and Enhancing Mask-Based Pretraining towards Universal Representations Mingze Dong Leda Wang Yuval Kluger SSL 93 0 0 25 Sep 2025
Spelling-out is not Straightforward: LLMs' Capability of Tokenization from Token to Characters Tatsuya Hiraoka Kentaro Inui 220 11 0 12 Jun 2025
EuroBERT: Scaling Multilingual Encoders for European Languages Nicolas Boizard Hippolyte Gisserot-Boukhlef Duarte M. Alves André F. T. Martins Ayoub Hammal ... Maxime Peyrard Nuno M. Guerreiro Patrick Fernandes Ricardo Rei Pierre Colombo 951 9 0 07 Mar 2025
Task-Informed Anti-Curriculum by Masking Improves Downstream Performance on TextAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 Andrei Jarca Florinel-Alin Croitoru Radu Tudor Ionescu 305 1 0 18 Feb 2025
GPT or BERT: why not both? Lucas Georges Gabriel Charpentier David Samuel 295 16 0 31 Dec 2024
DEPTH: Discourse Education through Pre-Training Hierarchically Zachary Bamberger Ofek Glick Chaim Baskin Yonatan Belinkov 255 0 0 13 May 2024
MosaicBERT: A Bidirectional Encoder Optimized for Fast PretrainingNeural Information Processing Systems (NeurIPS), 2023 Jacob P. Portes Alex Trott Sam Havens Daniel King Abhinav Venigalla Moin Nadeem Nikhil Sardana D. Khudia Jonathan Frankle 235 28 0 29 Dec 2023