ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.15096
  4. Cited By
Dynamic Masking Rate Schedules for MLM Pretraining
v1v2v3 (latest)

Dynamic Masking Rate Schedules for MLM Pretraining

Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
24 May 2023
Zachary Ankner
Naomi Saphra
Davis W. Blalock
Jonathan Frankle
Matthew L. Leavitt
ArXiv (abs)PDFHTML

Papers citing "Dynamic Masking Rate Schedules for MLM Pretraining"

8 / 8 papers shown
Title
Chinese ModernBERT with Whole-Word Masking
Chinese ModernBERT with Whole-Word Masking
Zeyu Zhao
Ningtao Wang
Xing Fu
Yu-Jie Cheng
56
0
0
14 Oct 2025
Understanding and Enhancing Mask-Based Pretraining towards Universal Representations
Understanding and Enhancing Mask-Based Pretraining towards Universal Representations
Mingze Dong
Leda Wang
Yuval Kluger
SSL
93
0
0
25 Sep 2025
Spelling-out is not Straightforward: LLMs' Capability of Tokenization from Token to Characters
Spelling-out is not Straightforward: LLMs' Capability of Tokenization from Token to Characters
Tatsuya Hiraoka
Kentaro Inui
220
11
0
12 Jun 2025
EuroBERT: Scaling Multilingual Encoders for European Languages
EuroBERT: Scaling Multilingual Encoders for European Languages
Nicolas Boizard
Hippolyte Gisserot-Boukhlef
Duarte M. Alves
André F. T. Martins
Ayoub Hammal
...
Maxime Peyrard
Nuno M. Guerreiro
Patrick Fernandes
Ricardo Rei
Pierre Colombo
951
9
0
07 Mar 2025
Task-Informed Anti-Curriculum by Masking Improves Downstream Performance on Text
Task-Informed Anti-Curriculum by Masking Improves Downstream Performance on TextAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Andrei Jarca
Florinel-Alin Croitoru
Radu Tudor Ionescu
305
1
0
18 Feb 2025
GPT or BERT: why not both?
GPT or BERT: why not both?
Lucas Georges Gabriel Charpentier
David Samuel
295
16
0
31 Dec 2024
DEPTH: Discourse Education through Pre-Training Hierarchically
DEPTH: Discourse Education through Pre-Training Hierarchically
Zachary Bamberger
Ofek Glick
Chaim Baskin
Yonatan Belinkov
255
0
0
13 May 2024
MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining
MosaicBERT: A Bidirectional Encoder Optimized for Fast PretrainingNeural Information Processing Systems (NeurIPS), 2023
Jacob P. Portes
Alex Trott
Sam Havens
Daniel King
Abhinav Venigalla
Moin Nadeem
Nikhil Sardana
D. Khudia
Jonathan Frankle
235
28
0
29 Dec 2023
1