Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2210.07095
Cited By
Incorporating Context into Subword Vocabularies
13 October 2022
Shaked Yehezkel
Yuval Pinter
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Incorporating Context into Subword Vocabularies"
12 / 12 papers shown
Title
UniNet: A Unified Multi-granular Traffic Modeling Framework for Network Security
Binghui Wu
D. Divakaran
M. Gurusamy
57
0
0
06 Mar 2025
Linguistic Laws Meet Protein Sequences: A Comparative Analysis of Subword Tokenization Methods
Burak Suyunu
Enes Taylan
Arzucan Özgür
62
1
0
26 Nov 2024
From Tokens to Words: On the Inner Lexicon of LLMs
Guy Kaplan
Matanel Oren
Yuval Reif
Roy Schwartz
39
12
0
08 Oct 2024
Infusing clinical knowledge into tokenisers for language models
Abul Hasan
Jinge Wu
Quang Ngoc Nguyen
Salomé Andres
Imane Guellil
Huayu Zhang
Arlene Casey
Beatrice Alex
Bruce Guthrie
Honghan Wu
25
1
0
20 Jun 2024
PatternGPT :A Pattern-Driven Framework for Large Language Model Text Generation
Le Xiao
Xin Shan
19
4
0
02 Jul 2023
MaxMatch-Dropout: Subword Regularization for WordPiece
Tatsuya Hiraoka
27
8
0
09 Sep 2022
Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset
Peter Henderson
M. Krass
Lucia Zheng
Neel Guha
Christopher D. Manning
Dan Jurafsky
Daniel E. Ho
AILaw
ELM
129
94
0
01 Jul 2022
Improving Tokenisation by Alternative Treatment of Spaces
Edward Gow-Smith
Harish Tayyar Madabushi
Carolina Scarton
Aline Villavicencio
19
20
0
08 Apr 2022
How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models
Phillip Rust
Jonas Pfeiffer
Ivan Vulić
Sebastian Ruder
Iryna Gurevych
69
235
0
31 Dec 2020
Improving Multilingual Models with Language-Clustered Vocabularies
Hyung Won Chung
Dan Garrette
Kiat Chuan Tan
Jason Riesa
VLM
58
65
0
24 Oct 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,724
0
26 Sep 2016
1