ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.16837
  4. Cited By
A Formal Perspective on Byte-Pair Encoding

A Formal Perspective on Byte-Pair Encoding

29 June 2023
Vilém Zouhar
Clara Meister
Juan Luis Gastaldi
Li Du
Tim Vieira
Mrinmaya Sachan
Ryan Cotterell
ArXivPDFHTML

Papers citing "A Formal Perspective on Byte-Pair Encoding"

10 / 10 papers shown
Title
LUPET: Incorporating Hierarchical Information Path into Multilingual ASR
LUPET: Incorporating Hierarchical Information Path into Multilingual ASR
Wei Liu
Jingyong Hou
Dong Yang
Muyong Cao
Tan Lee
70
1
0
10 Jan 2025
Morphological Typology in BPE Subword Productivity and Language Modeling
Morphological Typology in BPE Subword Productivity and Language Modeling
Iñigo Parra
34
0
0
31 Oct 2024
Tokenization as Finite-State Transduction
Tokenization as Finite-State Transduction
Marco Cognetta
Naoaki Okazaki
21
0
0
21 Oct 2024
Batching BPE Tokenization Merges
Batching BPE Tokenization Merges
Alexander P. Morgan
30
0
0
05 Aug 2024
Improving Self Consistency in LLMs through Probabilistic Tokenization
Improving Self Consistency in LLMs through Probabilistic Tokenization
Ashutosh Sathe
Divyanshu Aggarwal
Sunayana Sitaram
37
4
0
04 Jul 2024
A cost minimization approach to fix the vocabulary size in a tokenizer
  for an End-to-End ASR system
A cost minimization approach to fix the vocabulary size in a tokenizer for an End-to-End ASR system
Sunil Kumar Kopparapu
Ashish Panda
28
0
0
29 Apr 2024
MYTE: Morphology-Driven Byte Encoding for Better and Fairer Multilingual
  Language Modeling
MYTE: Morphology-Driven Byte Encoding for Better and Fairer Multilingual Language Modeling
Tomasz Limisiewicz
Terra Blevins
Hila Gonen
Orevaoghene Ahia
Luke Zettlemoyer
30
13
0
15 Mar 2024
Two Counterexamples to Tokenization and the Noiseless Channel
Two Counterexamples to Tokenization and the Noiseless Channel
Marco Cognetta
Vilém Zouhar
Sangwhan Moon
Naoaki Okazaki
27
0
0
22 Feb 2024
LLM4VV: Developing LLM-Driven Testsuite for Compiler Validation
LLM4VV: Developing LLM-Driven Testsuite for Compiler Validation
Christian Munley
Aaron Jarmusch
Sunita Chandrasekaran
27
16
0
08 Oct 2023
Tokenization and the Noiseless Channel
Tokenization and the Noiseless Channel
Vilém Zouhar
Clara Meister
Juan Luis Gastaldi
Li Du
Mrinmaya Sachan
Ryan Cotterell
30
31
0
29 Jun 2023
1