ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.13707
  4. Cited By
Do All Languages Cost the Same? Tokenization in the Era of Commercial
  Language Models

Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models

23 May 2023
Orevaoghene Ahia
Sachin Kumar
Hila Gonen
Jungo Kasai
David R. Mortensen
Noah A. Smith
Yulia Tsvetkov
ArXivPDFHTML

Papers citing "Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models"

50 / 73 papers shown
Title
Crosslingual Reasoning through Test-Time Scaling
Crosslingual Reasoning through Test-Time Scaling
Zheng-Xin Yong
Muhammad Farid Adilazuarda
Jonibek Mansurov
Ruochen Zhang
Niklas Muennighoff
Carsten Eickhoff
Genta Indra Winata
Julia Kreutzer
Stephen H. Bach
Alham Fikri Aji
LRM
ELM
57
0
0
08 May 2025
Can you map it to English? The Role of Cross-Lingual Alignment in Multilingual Performance of LLMs
Can you map it to English? The Role of Cross-Lingual Alignment in Multilingual Performance of LLMs
Kartik Ravisankar
HyoJung Han
Marine Carpuat
26
0
0
13 Apr 2025
SuperBPE: Space Travel for Language Models
SuperBPE: Space Travel for Language Models
Alisa Liu
J. Hayase
Valentin Hofmann
Sewoong Oh
Noah A. Smith
Yejin Choi
43
1
0
17 Mar 2025
Multidimensional Consistency Improves Reasoning in Language Models
Huiyuan Lai
Xiao Zhang
Malvina Nissim
LRM
36
0
0
04 Mar 2025
MiLiC-Eval: Benchmarking Multilingual LLMs for China's Minority Languages
Chen Zhang
Mingxu Tao
Zhiyuan Liao
Yansong Feng
33
0
0
03 Mar 2025
TituLLMs: A Family of Bangla LLMs with Comprehensive Benchmarking
TituLLMs: A Family of Bangla LLMs with Comprehensive Benchmarking
Shahriar Kabir Nahin
R. N. Nandi
Sagor Sarker
Quazi Sarwar Muhtaseem
Md. Kowsher
Apu Chandraw Shill
Md Ibrahim
Mehadi Hasan Menon
Tareq Al Muntasir
Firoj Alam
66
0
0
24 Feb 2025
Tokenization is Sensitive to Language Variation
Tokenization is Sensitive to Language Variation
Anna Wegmann
Dong Nguyen
David Jurgens
75
1
0
24 Feb 2025
Beyond Release: Access Considerations for Generative AI Systems
Beyond Release: Access Considerations for Generative AI Systems
Irene Solaiman
Rishi Bommasani
Dan Hendrycks
Ariel Herbert-Voss
Yacine Jernite
Aviya Skowron
Andrew Trask
58
1
0
23 Feb 2025
Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation
Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation
SeongYeub Chu
JongWoo Kim
MunYong Yi
53
1
0
21 Feb 2025
Mind the Gap! Choice Independence in Using Multilingual LLMs for Persuasive Co-Writing Tasks in Different Languages
Mind the Gap! Choice Independence in Using Multilingual LLMs for Persuasive Co-Writing Tasks in Different Languages
Shreyan Biswas
Alexander Erlei
U. Gadiraju
101
2
0
13 Feb 2025
DateLogicQA: Benchmarking Temporal Biases in Large Language Models
DateLogicQA: Benchmarking Temporal Biases in Large Language Models
Gagan Bhatia
MingZe Tang
Cristina Mahanta
Madiha Kazi
71
0
0
17 Dec 2024
Efficient Continual Pre-training of LLMs for Low-resource Languages
Efficient Continual Pre-training of LLMs for Low-resource Languages
Arijit Nag
Soumen Chakrabarti
Animesh Mukherjee
Niloy Ganguly
67
0
0
13 Dec 2024
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
Julie Kallini
Shikhar Murty
Christopher D. Manning
Christopher Potts
Róbert Csordás
24
2
0
28 Oct 2024
Responsible Multilingual Large Language Models: A Survey of Development,
  Applications, and Societal Impact
Responsible Multilingual Large Language Models: A Survey of Development, Applications, and Societal Impact
Junhua Liu
Bin Fu
LRM
26
1
0
23 Oct 2024
Cross-Modal Few-Shot Learning: a Generative Transfer Learning Framework
Cross-Modal Few-Shot Learning: a Generative Transfer Learning Framework
Zhengwei Yang
Yuke Li
Qiang Sun
Basura Fernando
Heng-Chiao Huang
Zheng Wang
21
1
0
14 Oct 2024
Adapters for Altering LLM Vocabularies: What Languages Benefit the Most?
Adapters for Altering LLM Vocabularies: What Languages Benefit the Most?
HyoJung Han
Akiko Eriguchi
Haoran Xu
Hieu T. Hoang
Marine Carpuat
Huda Khayrallah
VLM
32
2
0
12 Oct 2024
Stereotype or Personalization? User Identity Biases Chatbot
  Recommendations
Stereotype or Personalization? User Identity Biases Chatbot Recommendations
Anjali Kantharuban
Jeremiah Milbauer
Emma Strubell
Graham Neubig
21
9
0
08 Oct 2024
From Tokens to Words: On the Inner Lexicon of LLMs
From Tokens to Words: On the Inner Lexicon of LLMs
Guy Kaplan
Matanel Oren
Yuval Reif
Roy Schwartz
39
12
0
08 Oct 2024
Gradient Routing: Masking Gradients to Localize Computation in Neural
  Networks
Gradient Routing: Masking Gradients to Localize Computation in Neural Networks
Alex Cloud
Jacob Goldman-Wetzler
Evžen Wybitul
Joseph Miller
Alexander Matt Turner
14
3
0
06 Oct 2024
Towards Safe Multilingual Frontier AI
Towards Safe Multilingual Frontier AI
Artūrs Kanepajs
Vladimir Ivanov
Richard Moulange
18
1
0
06 Sep 2024
Evaluating Cultural Adaptability of a Large Language Model via
  Simulation of Synthetic Personas
Evaluating Cultural Adaptability of a Large Language Model via Simulation of Synthetic Personas
Louis Kwok
Michal Bravansky
Lewis D. Griffin
34
11
0
13 Aug 2024
Data Mixture Inference: What do BPE Tokenizers Reveal about their
  Training Data?
Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?
J. Hayase
Alisa Liu
Yejin Choi
Sewoong Oh
Noah A. Smith
27
9
0
23 Jul 2024
Cross-Lingual Multi-Hop Knowledge Editing
Cross-Lingual Multi-Hop Knowledge Editing
Aditi Khandelwal
Harman Singh
Hengrui Gu
Tianlong Chen
Kaixiong Zhou
KELM
26
0
0
14 Jul 2024
MAGNET: Improving the Multilingual Fairness of Language Models with
  Adaptive Gradient-Based Tokenization
MAGNET: Improving the Multilingual Fairness of Language Models with Adaptive Gradient-Based Tokenization
Orevaoghene Ahia
Sachin Kumar
Hila Gonen
Valentin Hoffman
Tomasz Limisiewicz
Yulia Tsvetkov
Noah A. Smith
25
4
0
11 Jul 2024
A Principled Framework for Evaluating on Typologically Diverse Languages
A Principled Framework for Evaluating on Typologically Diverse Languages
Esther Ploeger
Wessel Poelman
Andreas Holck Høeg-Petersen
Anders Schlichtkrull
Miryam de Lhoneux
Johannes Bjerva
33
1
0
06 Jul 2024
How Does Quantization Affect Multilingual LLMs?
How Does Quantization Affect Multilingual LLMs?
Kelly Marchisio
Saurabh Dash
Hongyu Chen
Dennis Aumiller
A. Ustun
Sara Hooker
Sebastian Ruder
MQ
44
6
0
03 Jul 2024
Understanding and Mitigating Language Confusion in LLMs
Understanding and Mitigating Language Confusion in LLMs
Kelly Marchisio
Wei-Yin Ko
Alexandre Berard
Théo Dehaze
Sebastian Ruder
49
23
0
28 Jun 2024
CaLMQA: Exploring culturally specific long-form question answering
  across 23 languages
CaLMQA: Exploring culturally specific long-form question answering across 23 languages
Shane Arora
Marzena Karpinska
Hung-Ting Chen
Ipsita Bhattacharjee
Mohit Iyyer
Eunsol Choi
HILM
37
11
0
25 Jun 2024
Towards Fast Multilingual LLM Inference: Speculative Decoding and
  Specialized Drafters
Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters
Euiin Yi
Taehyeon Kim
Hongseok Jeung
Du-Seong Chang
Se-Young Yun
38
4
0
24 Jun 2024
Exploring Design Choices for Building Language-Specific LLMs
Exploring Design Choices for Building Language-Specific LLMs
Atula Tejaswi
Nilesh Gupta
Eunsol Choi
22
3
0
20 Jun 2024
How Multilingual Are Large Language Models Fine-Tuned for Translation?
How Multilingual Are Large Language Models Fine-Tuned for Translation?
Aquia Richburg
Marine Carpuat
LRM
25
4
0
30 May 2024
Aya 23: Open Weight Releases to Further Multilingual Progress
Aya 23: Open Weight Releases to Further Multilingual Progress
Viraat Aryabumi
John Dang
Dwarak Talupuru
Saurabh Dash
David Cairuz
...
Aidan N. Gomez
Phil Blunsom
Marzieh Fadaee
A. Ustun
Sara Hooker
OSLM
44
72
0
23 May 2024
Targeted Multilingual Adaptation for Low-resource Language Families
Targeted Multilingual Adaptation for Low-resource Language Families
C.M. Downey
Terra Blevins
Dhwani Serai
Dwija Parikh
Shane Steinert-Threlkeld
27
2
0
20 May 2024
Risks and Opportunities of Open-Source Generative AI
Risks and Opportunities of Open-Source Generative AI
Francisco Eiras
Aleksander Petrov
Bertie Vidgen
Christian Schroeder
Fabio Pizzati
...
Matthew Jackson
Phillip H. S. Torr
Trevor Darrell
Y. Lee
Jakob N. Foerster
37
18
0
14 May 2024
Zero-Shot Tokenizer Transfer
Zero-Shot Tokenizer Transfer
Benjamin Minixhofer
E. Ponti
Ivan Vulić
VLM
36
8
0
13 May 2024
Continual Pre-Training for Cross-Lingual LLM Adaptation: Enhancing
  Japanese Language Capabilities
Continual Pre-Training for Cross-Lingual LLM Adaptation: Enhancing Japanese Language Capabilities
Kazuki Fujii
Taishi Nakamura
Mengsay Loem
Hiroki Iida
Masanari Ohi
Kakeru Hattori
Hirai Shota
Sakae Mizuki
Rio Yokota
Naoaki Okazaki
CLL
22
53
0
27 Apr 2024
Near to Mid-term Risks and Opportunities of Open-Source Generative AI
Near to Mid-term Risks and Opportunities of Open-Source Generative AI
Francisco Eiras
Aleksandar Petrov
Bertie Vidgen
Christian Schroeder de Witt
Fabio Pizzati
...
Paul Röttger
Philip H. S. Torr
Trevor Darrell
Y. Lee
Jakob N. Foerster
33
5
0
25 Apr 2024
IndicGenBench: A Multilingual Benchmark to Evaluate Generation
  Capabilities of LLMs on Indic Languages
IndicGenBench: A Multilingual Benchmark to Evaluate Generation Capabilities of LLMs on Indic Languages
Harman Singh
Nitish Gupta
Shikhar Bharadwaj
Dinesh Tewari
Partha P. Talukdar
ELM
24
22
0
25 Apr 2024
SpaceByte: Towards Deleting Tokenization from Large Language Modeling
SpaceByte: Towards Deleting Tokenization from Large Language Modeling
Kevin Slagle
27
3
0
22 Apr 2024
The Role of Language Imbalance in Cross-lingual Generalisation: Insights
  from Cloned Language Experiments
The Role of Language Imbalance in Cross-lingual Generalisation: Insights from Cloned Language Experiments
Anton Schäfer
Shauli Ravfogel
Thomas Hofmann
Tiago Pimentel
Imanol Schlag
55
3
0
11 Apr 2024
Understanding Cross-Lingual Alignment -- A Survey
Understanding Cross-Lingual Alignment -- A Survey
Katharina Hämmerl
Jindvrich Libovický
Alexander M. Fraser
36
2
0
09 Apr 2024
Cendol: Open Instruction-tuned Generative Large Language Models for
  Indonesian Languages
Cendol: Open Instruction-tuned Generative Large Language Models for Indonesian Languages
Samuel Cahyawijaya
Holy Lovenia
Fajri Koto
Rifki Afina Putri
Emmanuel Dave
...
Bryan Wilie
Genta Indra Winata
Alham Fikri Aji
Ayu Purwarianti
Pascale Fung
42
15
0
09 Apr 2024
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Bo Peng
Daniel Goldstein
Quentin G. Anthony
Alon Albalak
Eric Alcaide
...
Bingchen Zhao
Qihang Zhao
Peng Zhou
Jian Zhu
Ruijie Zhu
43
73
0
08 Apr 2024
Multilingual Large Language Model: A Survey of Resources, Taxonomy and
  Frontiers
Multilingual Large Language Model: A Survey of Resources, Taxonomy and Frontiers
Libo Qin
Qiguang Chen
Yuhang Zhou
Zhi Chen
Yinghui Li
Lizi Liao
Min Li
Wanxiang Che
Philip S. Yu
LRM
47
35
0
07 Apr 2024
HyperCLOVA X Technical Report
HyperCLOVA X Technical Report
Kang Min Yoo
Jaegeun Han
Sookyo In
Heewon Jeon
Jisu Jeong
...
Hyunkyung Noh
Se-Eun Choi
Sang-Woo Lee
Jung Hwa Lim
Nako Sung
VLM
25
8
0
02 Apr 2024
BEnQA: A Question Answering and Reasoning Benchmark for Bengali and
  English
BEnQA: A Question Answering and Reasoning Benchmark for Bengali and English
H. M. Q. H. Sheikh Shafayat
Rishav Hada
Isaac Cowhey
Rifki Afina
Jerry Tworek
Lorie De Leon
24
3
0
16 Mar 2024
MYTE: Morphology-Driven Byte Encoding for Better and Fairer Multilingual
  Language Modeling
MYTE: Morphology-Driven Byte Encoding for Better and Fairer Multilingual Language Modeling
Tomasz Limisiewicz
Terra Blevins
Hila Gonen
Orevaoghene Ahia
Luke Zettlemoyer
22
12
0
15 Mar 2024
Cost-Performance Optimization for Processing Low-Resource Language Tasks
  Using Commercial LLMs
Cost-Performance Optimization for Processing Low-Resource Language Tasks Using Commercial LLMs
Arijit Nag
Animesh Mukherjee
Niloy Ganguly
Soumen Chakrabarti
20
2
0
08 Mar 2024
What Is Missing in Multilingual Visual Reasoning and How to Fix It
What Is Missing in Multilingual Visual Reasoning and How to Fix It
Yueqi Song
Simran Khanuja
Graham Neubig
VLM
LRM
82
6
0
03 Mar 2024
A Bit of a Problem: Measurement Disparities in Dataset Sizes Across
  Languages
A Bit of a Problem: Measurement Disparities in Dataset Sizes Across Languages
Catherine Arnett
Tyler A. Chang
Benjamin Bergen
11
3
0
01 Mar 2024
12
Next