ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.01289
  4. Cited By
Greed is All You Need: An Evaluation of Tokenizer Inference Methods

Greed is All You Need: An Evaluation of Tokenizer Inference Methods

2 March 2024
Omri Uzan
Craig W. Schmidt
Chris Tanner
Yuval Pinter
ArXivPDFHTML

Papers citing "Greed is All You Need: An Evaluation of Tokenizer Inference Methods"

10 / 10 papers shown
Title
Cross-Tokenizer Distillation via Approximate Likelihood Matching
Cross-Tokenizer Distillation via Approximate Likelihood Matching
Benjamin Minixhofer
Ivan Vulić
E. Ponti
59
0
0
25 Mar 2025
Splintering Nonconcatenative Languages for Better Tokenization
Splintering Nonconcatenative Languages for Better Tokenization
Bar Gazit
Shaltiel Shmidman
Avi Shmidman
Yuval Pinter
57
0
0
18 Mar 2025
Tokenization is Sensitive to Language Variation
Tokenization is Sensitive to Language Variation
Anna Wegmann
Dong Nguyen
David Jurgens
72
1
0
24 Feb 2025
Hit the Sweet Spot! Span-Level Ensemble for Large Language Models
Hit the Sweet Spot! Span-Level Ensemble for Large Language Models
Yangyifan Xu
Jianghao Chen
Junhong Wu
Jiajun Zhang
MoE
17
2
0
27 Sep 2024
Zero-Shot Tokenizer Transfer
Zero-Shot Tokenizer Transfer
Benjamin Minixhofer
E. Ponti
Ivan Vulić
VLM
36
8
0
13 May 2024
Evaluating Subword Tokenization: Alien Subword Composition and OOV
  Generalization Challenge
Evaluating Subword Tokenization: Alien Subword Composition and OOV Generalization Challenge
Khuyagbaatar Batsuren
Ekaterina Vylomova
Verna Dankers
Tsetsuukhei Delgerbaatar
Omri Uzan
Yuval Pinter
Gábor Bella
27
9
0
20 Apr 2024
An Analysis of BPE Vocabulary Trimming in Neural Machine Translation
An Analysis of BPE Vocabulary Trimming in Neural Machine Translation
Marco Cognetta
Tatsuya Hiraoka
Naoaki Okazaki
Rico Sennrich
Yuval Pinter
24
2
0
30 Mar 2024
Different Tokenization Schemes Lead to Comparable Performance in Spanish
  Number Agreement
Different Tokenization Schemes Lead to Comparable Performance in Spanish Number Agreement
Catherine Arnett
Pamela D. Rivière
Tyler A. Chang
Sean Trott
16
2
0
20 Mar 2024
Tokenization Is More Than Compression
Tokenization Is More Than Compression
Craig W. Schmidt
Varshini Reddy
Haoran Zhang
Alec Alameddine
Omri Uzan
Yuval Pinter
Chris Tanner
24
27
0
28 Feb 2024
Google's Neural Machine Translation System: Bridging the Gap between
  Human and Machine Translation
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,435
0
26 Sep 2016
1