You should evaluate your language model on marginal likelihood over tokenisations

6 September 2021

Papers citing "You should evaluate your language model on marginal likelihood over tokenisations"

5 / 5 papers shown

Title
Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo João Loula Benjamin LeBrun Li Du Ben Lipkin Clemente Pasti ... Ryan Cotterel Vikash K. Mansinghka Alexander K. Lew Tim Vieira Timothy J. O'Donnell 32 1 0 17 Apr 2025
What is the best recipe for character-level encoder-only modelling? Kris Cao 32 2 0 09 May 2023
Language Modelling with Pixels Phillip Rust Jonas F. Lotz Emanuele Bugliarello Elizabeth Salesky Miryam de Lhoneux Desmond Elliott VLM 32 46 0 14 Jul 2022
Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP Sabrina J. Mielke Zaid Alyafeai Elizabeth Salesky Colin Raffel Manan Dey ... Arun Raja Chenglei Si Wilson Y. Lee Benoît Sagot Samson Tan 30 140 0 20 Dec 2021
Morphology Matters: A Multilingual Language Modeling Analysis Hyunji Hayley Park Katherine J. Zhang Coleman Haley K. Steimel Han Liu Lane Schwartz 42 47 0 11 Dec 2020