Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2312.11779
Cited By
Tokenization Matters: Navigating Data-Scarce Tokenization for Gender Inclusive Language Technologies
19 December 2023
Anaelia Ovalle
Ninareh Mehrabi
Palash Goyal
Jwala Dhamala
Kai-Wei Chang
Richard Zemel
Aram Galstyan
Yuval Pinter
Rahul Gupta
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Tokenization Matters: Navigating Data-Scarce Tokenization for Gender Inclusive Language Technologies"
9 / 9 papers shown
Title
Agree to Disagree? A Meta-Evaluation of LLM Misgendering
Arjun Subramonian
Vagrant Gautam
Preethi Seshadri
Dietrich Klakow
Kai-Wei Chang
Yizhou Sun
27
1
0
23 Apr 2025
A Multilingual, Culture-First Approach to Addressing Misgendering in LLM Applications
Sunayana Sitaram
Adrian de Wynter
Isobel McCrum
Qilong Gu
Si-Qing Chen
AILaw
104
0
0
26 Mar 2025
Adversarial Tokenization
Renato Lui Geh
Zilei Shao
Guy Van den Broeck
SILM
AAML
87
0
0
04 Mar 2025
Robust Bias Detection in MLMs and its Application to Human Trait Ratings
Ingroj Shrestha
Louis Tay
Padmini Srinivasan
78
0
0
24 Feb 2025
Where is the signal in tokenization space?
Renato Lui Geh
Honghua Zhang
Kareem Ahmed
Benjie Wang
Guy Van den Broeck
25
4
0
16 Aug 2024
Robust Pronoun Fidelity with English LLMs: Are they Reasoning, Repeating, or Just Biased?
Vagrant Gautam
Eileen Bingert
D. Zhu
Anne Lauscher
Dietrich Klakow
43
8
0
04 Apr 2024
Greed is All You Need: An Evaluation of Tokenizer Inference Methods
Omri Uzan
Craig W. Schmidt
Chris Tanner
Yuval Pinter
38
14
0
02 Mar 2024
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
248
1,986
0
31 Dec 2020
How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models
Phillip Rust
Jonas Pfeiffer
Ivan Vulić
Sebastian Ruder
Iryna Gurevych
69
235
0
31 Dec 2020
1