ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2312.11779
  4. Cited By
Tokenization Matters: Navigating Data-Scarce Tokenization for Gender
  Inclusive Language Technologies

Tokenization Matters: Navigating Data-Scarce Tokenization for Gender Inclusive Language Technologies

19 December 2023
Anaelia Ovalle
Ninareh Mehrabi
Palash Goyal
Jwala Dhamala
Kai-Wei Chang
Richard Zemel
Aram Galstyan
Yuval Pinter
Rahul Gupta
ArXivPDFHTML

Papers citing "Tokenization Matters: Navigating Data-Scarce Tokenization for Gender Inclusive Language Technologies"

9 / 9 papers shown
Title
Agree to Disagree? A Meta-Evaluation of LLM Misgendering
Agree to Disagree? A Meta-Evaluation of LLM Misgendering
Arjun Subramonian
Vagrant Gautam
Preethi Seshadri
Dietrich Klakow
Kai-Wei Chang
Yizhou Sun
27
1
0
23 Apr 2025
A Multilingual, Culture-First Approach to Addressing Misgendering in LLM Applications
A Multilingual, Culture-First Approach to Addressing Misgendering in LLM Applications
Sunayana Sitaram
Adrian de Wynter
Isobel McCrum
Qilong Gu
Si-Qing Chen
AILaw
104
0
0
26 Mar 2025
Adversarial Tokenization
Renato Lui Geh
Zilei Shao
Guy Van den Broeck
SILM
AAML
87
0
0
04 Mar 2025
Robust Bias Detection in MLMs and its Application to Human Trait Ratings
Robust Bias Detection in MLMs and its Application to Human Trait Ratings
Ingroj Shrestha
Louis Tay
Padmini Srinivasan
78
0
0
24 Feb 2025
Where is the signal in tokenization space?
Where is the signal in tokenization space?
Renato Lui Geh
Honghua Zhang
Kareem Ahmed
Benjie Wang
Guy Van den Broeck
25
4
0
16 Aug 2024
Robust Pronoun Fidelity with English LLMs: Are they Reasoning,
  Repeating, or Just Biased?
Robust Pronoun Fidelity with English LLMs: Are they Reasoning, Repeating, or Just Biased?
Vagrant Gautam
Eileen Bingert
D. Zhu
Anne Lauscher
Dietrich Klakow
43
8
0
04 Apr 2024
Greed is All You Need: An Evaluation of Tokenizer Inference Methods
Greed is All You Need: An Evaluation of Tokenizer Inference Methods
Omri Uzan
Craig W. Schmidt
Chris Tanner
Yuval Pinter
38
14
0
02 Mar 2024
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
248
1,986
0
31 Dec 2020
How Good is Your Tokenizer? On the Monolingual Performance of
  Multilingual Language Models
How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models
Phillip Rust
Jonas Pfeiffer
Ivan Vulić
Sebastian Ruder
Iryna Gurevych
69
235
0
31 Dec 2020
1